| EAA-dev Home |

WORK-IN-PROGRESS: - this material is still under development

Nested Function

Compose functions by nesting function calls as arguments of of other calls.

    computer(
      processor(
        cores(2),
        Processor.Type.i386
      ),
      disk(
        size(150)
      ),
      disk(
        size(75),
        speed(7200),
        Disk.Interface.SATA
      )
    );

How it Works

The most notable property of Nested Function is the way it affects the evaluation order of its arguments. Function Sequence and Method Chaining both evaluate the functions in a left-to-right sequence. Nested Function evaluates the arguments of a function before the enclosing function itself. I find this most memorable with the "Old Macdonald" example: to sing the chorus you type o(i(e(i(e()))). This evaluation order has an impact on both how to use Nested Function and when to chose it compared to alternatives.

Evaluating the enclosing function last can be very handy in that it provides a built-in context to work with the arguments. Consider defining a computer processor configuration.

processor(
  cores(2),
  speed(2100),
  type(i386)
)

The nice thing here is that the argument functions can return fully formed values which the processor function can then assemble into its return value. Since the processor function evaluates last, we don't need to worry about the stopping problem of Method Chaining, nor do we need to have the Context Variable that we need for Function Sequence.

With mandatory elements in the grammar, along the lines of parent::= first secondNested Function works particularly well. A parent function can define exactly the arguments required in the child functions and with a statically typed language can also define the return types which enables IDE completion.

One issue with function arguments is how to label them so as to make them readbale. Consider indicating a the size and speed of a disk. The natural programming response is disk(150, 7200) but this isn't terribly readable as there's no indication what the numbers mean unless you have a language with keyword arguments. A way to deal with this is to use a wrapping function that does nothing other than provide a name: disk(size(150), speed(7200)). In the simplest form of this the wrapping function just returns the argument value - as a result it's pure syntactic sugar. It also means that there's no enforcement of the meaning of these functions - a call to disk(speed(7200), size(150)) could easily result in a very slow disk. You can avoid this by making the nested functions return intermediate data such as a builder or token - although that is more effort to set up.

Optional arguments can also present problems. If the base language supports default arguments for functions, you can use these for the optional case. If you don't have this one approach is to define different functions for each combination of the optional arguments. If you only have a couple of cases this is tedious but reasonable. As the number of optional arguments increase so does the tediousness (but not the reasonableness). One way out of this problem is to use intermediate data again - tokens can be a particularly effective choice.

With multiple arguments of the same thing, a varargs parameter is the best choice if the host language supports it. Mutliple arguments of differnet kinds end up being like optional arguments, with the same complications.

The worst case of this is a grammar like parent::= (this | that)*. The issue here is that, unless you have keyword arguments, the only way to identify the arguments is through their position and type. This can make picking out which argument is which messy - and downright impossible if this and that have the same types. Once this happens you are forced into either returning intermediate results, or using a Context Variable. Using a Context Variable is particularly difficult here since the parent function isn't evaluated till the end, forcing you to use the broader context of the langauge to properly set up the Context Variable.

In order to keep the DSL readable, you usually want Nested Functions to be bare function calls. This implies you either need to make them global functions or use Object Scoping. Since global functions are problematic, I usually look to use Object Scoping if I can. However global functions can often much less problematic in Nested Function because the biggest problem with global functions is when they come with global parsing state. A global function that just returns a value, such as a static method like DayOfWeek.MONDAY is often a good choice.

When to use it

One of the great strengths, and weakneesses of Nested Function is the order of evaluation. With Nested Function the arguments are evaluated before the parent function (unless you use closures for arguments). This is very useful for building up a hierarchy of values because you can have the arguments create fully formed framework objects which can be assembled by the parent function. This can avoid much of the mucking about with replacements and intermediate data that you get with Function Sequence and Method Chaining.

Conversely this evaluation order causes problems in a sequence of commands leading to the Old Macdonald problem: o(i(e(i(e()))). So for a sequence that you want to read left to right, Function Sequence or Method Chaining are usually a better bet. For precise control of when to evaluate multiple arguments, use Nested Closure.

Nested Function also often struggles with optional arguments and multiple varied arguments. Nested Function very much expects to say what you want and in the precise order, so if you need greater flexibility you'll need to look to Method Chaining or a Literal Collection Expression. Nesting a Dictionary Literal Collection Expression is often a good choice as it allows you to get the arguments sorted out before calling the parent while giving you the flexibility of ordering and optionality of the arguments, particularly with a hash argument.

Another disadvantage of Nested Function is the punctuation, which usually relies on matching brackets and putting commas in the right place. At its worst this can look like a disfigured lisp, with all the parentheses and added warts. This is less of an issue for DSLs aimed at programmers, who get more used to these warts.

Name clashes are less of a trouble here than with Function Sequence, since the parent function provides context to interpret the nested function call. As a result you can hapily use "speed" for processor speed and disk speed and use the same function as long as the types are compatable.

Example: The simple computer configuration example (Java)

[TBD: Consider changing to object scoping here as it is my preferred option]

Here's the script the common running example of stating the configuration of a simple computer

    computer(
      processor(
        cores(2),
        Processor.Type.i386
      ),
      disk(
        size(150)
      ),
      disk(
        size(75),
        speed(7200),
        Disk.Interface.SATA
      )
    );

For this case each clause in the script returns a framework object, so I can use the nested evaluation order to build up the entire expression without using Context Variables. I'll start from the bottom, looking at the processor clause.

class Builder...
  static Processor processor(int cores, Processor.Type type) {
    return new Processor(cores, type);
  }
  static int cores(int value) {
    return value;
  }

I've defined the builder functions as static functions on a builder class. By using Java's static import feature I can use bare function calls to invoke the functions. (Is it only me who finds it confusing that we call them "static imports" but have to declare them with import static.) I also use static imports to bring in enum types defined by the framework which I can easily use directly here. In case you skipped dessert before reading this I've included a pure sugar (sucratic?) cores funtion for readability.

The disk clase has optional arguments. Since there's only a couple I'll nap for a while I write out the combination of functions.

class Builder...
  static Disk disk(int size, int speed, Disk.Interface iface) {
    return new Disk(size, speed, iface);
  }
  static Disk disk(int size) {
    return disk(size, Disk.UNKNOWN_SIZE, null);
  }
  static Disk disk(int size, int speed) {
    return disk(size, speed, null);
  }
  static Disk disk(int size, Disk.Interface iface) {
    return disk(size, Disk.UNKNOWN_SIZE, iface);
  }

For the top level computer clause, I use varargs parameter to handle the multiple disks.

class Builder...
  static Computer computer(Processor p, Disk... d) {
    return new Computer(p, d);
  }

Example: Handling multiple different arguments with tokens (C#)

One of the trickier areas to use Nested Function is where you have multiple arguments of different kinds. Consider a language for defining properties of an onscreen box.

      box(
        topBorder(2),
        bottomBorder(2),
        leftMargin(3),
        transparent
      );
      box(
        leftMargin(2),
        rightMargin(5)
      );

In this situation we can have any number of a wide combination of properties to set. There's no strong reason to force an order in declaring the properties, so the usual style of argument identification in C# (position) doesn't work too well. For this example I'll explore using tokens to identify the arguments to compose them into the structure.

Here's a look at the target framework object.

  class Box {
    public bool IsTransparent = false;
    public int[] Borders = { 1, 1, 1, 1 }; //TRouBLe - top right bottom left
    public int[] Margins = { 0, 0, 0, 0 }; //TRouBLe - top right bottom left

The various contained functions all return a token data type, which looks like this

  class BoxToken {
    public enum Types { TopBorder, BottomBorder, LeftMargin, RightMargin, Transparent }
    public readonly Types Type;
    public readonly Object Value;
    public BoxToken(Types type, Object value) {
      Type = type;
      Value = value;
    }

I'm using Object Scoping and defined the clauses of the DSL as functions on the builder supertype.

class Builder...
    protected BoxToken topBorder(int arg) {
      return new BoxToken(BoxToken.Types.TopBorder, arg);
    }
    protected BoxToken bottomBorder(int arg) {
      return new BoxToken(BoxToken.Types.BottomBorder, arg);
    }

I'm only showing a couple of them, but I'm sure you can deduce from these what the rest look like.

The parent function now just runs through the argument results and assembles a box.

class Builder...
    protected void box(params BoxToken[] args) {
      Box newBox = new Box();
      foreach (BoxToken t in args) updateAttribute(newBox, t);
      boxes.Add(newBox);
    }

    List<Box> boxes = new List<Box>();

    private void updateAttribute(Box box, BoxToken token) {
      switch (token.Type) {
        case BoxToken.Types.TopBorder:
          box.Borders[0] = (int)token.Value;
          break;
        case BoxToken.Types.BottomBorder:
          box.Borders[2] = (int)token.Value;
          break;
        case BoxToken.Types.LeftMargin:
          box.Margins[3] = (int)token.Value;
          break;
        case BoxToken.Types.RightMargin:
          box.Margins[1] = (int)token.Value;
          break;
        case BoxToken.Types.Transparent:
          box.IsTransparent = (bool)token.Value;
          break;
        default:
          throw new InvalidOperationException("Unreachable");
      }
    }

Example: Using subtype tokens for IDE support (Java)

Most languages differentiate between different function arguments by their position. So in the above example, we might set the size and speed of a disk with a function like disk(150, 7200). That bare function isn't too readable, so in the above example I wrapped the numbers with simple functions to get disk(size(150), speed(7200)). In the earlier code example the functions just return their argument, which aids readability but doesn't prevent someone typing the erroneous disk(speed(7200), size(150)).

Using simple tokens, like in the Box example, provides a mechanism for error checking. By returning a token of [size, 150] you can use the token type to check you have the right argument in the right position, or indeed make the arguments work in any order.

Checking is all very well, but in a statically typed language with modern IDE you want to go further. You want code completion pop ups to force you to put size before speed. By using subclasses you can pull this off.

The tokens I used above used the token type as a property of the token. The alternative is to create a different subtype for each token, I can then use the subtype for in the parent function definition.

Here's the short script I want to support.

      disk(
        size(150),
        speed(7200)
      );

Here's the target framework object

public class Disk {
  private int size, speed;
  public Disk(int size, int speed) {
    this.size = size;
    this.speed = speed;
  }
  public int getSize() {
    return size;
  }
  public int getSpeed() {
    return speed;
  }
}

To handle size and speed I create a general integer token with subclasses for the two kinds of clause

public class IntegerToken {
  private final int value;
  public IntegerToken(int value) {
    this.value = value;
  }
  public int getValue() {
    return value;
  }
}
public class SpeedToken extends IntegerToken {
  public SpeedToken(int value) {
    super(value);
  }
}
public class SizeToken extends IntegerToken {
  public SizeToken(int value) {
    super(value);
  }
}

I can then define static functions in a builder that define the right arguments.

class Builder...
  public static Disk disk(SizeToken size, SpeedToken speed){
    return new Disk(size.getValue(), speed.getValue());
  }
  public static SizeToken size (int arg) {
    return new SizeToken(arg);
  }
  public static SpeedToken speed (int arg) {
    return new SpeedToken(arg);
  }

With these set up the IDE will only suggest the right functions in the right places and I'll see comforting red squigglies should I do any reckless typing.

Example: Recurring Events (C#)

I used to live in the South End of Boston. There was much to like about living in a downtown area of the city, close to restaurents and other ways to pass the time and spend my money. There were irritations, however, and one of them was street cleaning. On the first and third Monday of the month between April and October they would clean the streets near my apartment and I had to be sure I didn't leave my car there. Often I forgot and I got a ticket.

The rules for my street was that the cleaning occurred on the first and third Monday of the month between april and october. I could write a DSL expression for this:

        Schedule.First(DayOfWeek.Monday)
          .And(Schedule.Third(DayOfWeek.Monday))
          .From(Month.April)
          .Till(Month.October);

This example combines Method Chaining with Nested Function. Usually when I use Nested Function I prefer to combine it with Object Scoping, but in this case the functions that I'm nesting just return a value so I don't really feel a strong need to use Object Scoping.

The Framework

Recurring events are a recurring event in software systems. You often want to schedule things on particular combinations of dates like that. The way I think of them these days is that they are a Specification of dates. We want code that can tell us if a given date is included on a schedule. We do this by defining a general specification interface, which we can make generic as specifications are useful in all sorts of situations.

  internal interface Specification<T> {
    bool Includes(T arg);
  }

When building a specification model for a particular type, I like to identify small building blocks that I can combine together. One small building block is the notion of a particular period in a year, such as between April and October.

  internal class PeriodInYear : Specification<DateTime>
  {
    private readonly int startMonth;
    private readonly int endMonth;


    public PeriodInYear(int startMonth, int endMonth) {
      this.startMonth = startMonth;
      this.endMonth = endMonth;
    }

    public  bool Includes(DateTime arg) {
      return arg.Month >= startMonth && arg.Month <= endMonth;
    }

Another element is the notion of the first monday in the month. This class is a little more tricky as I have to walk through sample dates in the month to see which one is the first.

[TBD: Move index check ]
  internal class DayInMonth : Specification<DateTime> {
    private readonly int index;
    private readonly DayOfWeek dayOfWeek;

    public DayInMonth(int index, DayOfWeek dayOfWeek) {
      this.index = index;
      this.dayOfWeek = dayOfWeek;
    }

    public bool Includes(DateTime arg) {
      if (index <= 0) throw new NotSupportedException("index must be positive");
      int currentMatch = 0;
      foreach (DateTime d in new MonthEnumerator(arg.Month, arg.Year)) {
        if (d > arg) return false;
        if (d.DayOfWeek == dayOfWeek) {
          currentMatch++;
          if (currentMatch == index) return (d == arg);
        }
      }
      return false;
    }
  }

To walk through the days in a month, this specification makes use of a special enumerator. I set the enumerator with a particular month and year.

  internal class MonthEnumerator : IEnumerator<DateTime>, IEnumerable<DateTime> {
    private int year;
    private Month month;

    public MonthEnumerator(int month, int year) {
      this.month = new Month(month);
      this.year = year;
      Reset();
    }

It implements the IEnumerator methods.

class MonthEnumerator...
    private DateTime current;
    DateTime IEnumerator<DateTime>.Current { get { return current; } }
    public object Current { get { return current; } }

    public void Reset() {
      current = new DateTime(year, month.Number, 1).AddDays(-1);
    }

    public void Dispose() {}

    public bool MoveNext() {
      current = current.AddDays(1);
      return month.Includes(current);
    }

And also implements IEnumerable to allow it to be used in a foreach loop.

class MonthEnumerator...
    IEnumerator<DateTime> IEnumerable<DateTime>.GetEnumerator() {
      return this;
    }
    public IEnumerator GetEnumerator() {
      return this;
    }

Also taking part is a very simple Month class, which also acts as a specification.

    private readonly int number;
    public int Number { get { return number; } }
    public Month(int number) {
      this.number = number;
    }
    public bool Includes(DateTime arg) {
      return number == arg.Month;
    }

These are useful building blocks, but can't do much their own. To really make them sing and dance I need to able to combine them into logical expressions, which I do with a couple more specifications.

  abstract class CompositeSpecification<T> : Specification<T> {
    protected IList<Specification<T>> elements = new List<Specification<T>>();
    public CompositeSpecification(params Specification<T>[] elements) {
      this.elements = elements;
    }
    public abstract bool Includes(T arg);
  }

  internal class AndSpecification<T> : CompositeSpecification<T> {
    public AndSpecification(params Specification<T>[] elements)
      : base(elements) {}
    public override bool Includes(T arg) {
      foreach (Specification<T> s in elements)
        if (! s.Includes(arg)) return false;
      return true;
    }
  }

  internal class OrSpecification<T> : CompositeSpecification<T> {
    public OrSpecification(params Specification<T>[] elements)
      : base(elements) {}
    public override bool Includes(T arg) {
      foreach (Specification<T> s in elements)
        if (s.Includes(arg)) return true;
      return false;
    }
  } 

I trust you can figure out how to implement a NotSpecfication.

One thing I don't like about this framework is my usage of the DateTime class. The problem is that DateTime has sub-second precision, but I'm only working at day precision. Using over-precise temporal data types is very common, becuase usually libraries push us in that direction. However they can easily result in awkward bugs when you compare two DateTimes that are different below the level of precision you care about. If I were doing this on a real project I'd make a proper Date class with the correct precision.

The DSL

Here's the DSL text for my old street cleaning schedule.

        Schedule.First(DayOfWeek.Monday)
          .And(Schedule.Third(DayOfWeek.Monday))
          .From(Month.April)
          .Till(Month.October);

Like most realistic DSLs it uses a combination of internal DSL technique, here a mix of Method Chaining and Nested Function. I'm not going to worry too much about the Method Chaining here, instead I'll concentrate on the way that Nested Function is used. Since each Nested Function returns a simple value I don't find a strong need for Object Scoping as they won't need any Context Variables. As a result I'll use static methods. As I'm in C# this means all the static methods need to be prefixed with their class name. This reads pretty well, although it does add noise compared to an Object Scoping approach.

Two of the Nested Functions are calls to return a simple value. DayOfWeek.Monday is actually built into the .NET libraries. I added Month.April and friends myself.

class Month...
    public static readonly Month January = new Month(1);
    public static readonly Month February = new Month(2);
    // I don't need to show more do I?

The calls on Schedule are a bit different. The initial use of Schedule.First is an example of a common feature in these languages - using a bare function to create a starting object to begin the chaining. Schedule here is an Expression Builder. It's not called "builder" because I think it reads better as just "schedule".

class Schedule...
    public static Schedule First(DayOfWeek dayOfWeek) {
      return new Schedule(new DayInMonth(1, dayOfWeek));
    }

Like most Expression Builders, the schedule builds up a content, which is a specification.

class Schedule...
    private Specification<DateTime> content;
    public Specification<DateTime> Content { get { return content; } }
    public Schedule(Specification<DateTime> content) {
      this.content = content;
    }

Notice how the initial call returns a schedule that wraps the first element in the specification. The later call to Third is the same (except for the parameter). I would usually argue against writing different methods for something that would be better handled as a parameter, but this is yet another example where you have different rules of good programming when you use an Expression Builder.

It's the Method Chaining that actually builds up the composite structure. Here's the interestingly named "and" method.

class Schedule...
    public Schedule And(Schedule arg) {
      content = new OrSpecification<DateTime>(content, arg.content);
      return this;
    }

We say "first and third monday" in our language, but in terms of the specification it's the first or third monday that matches the boolean condition. It's an interesting example of where the DSL is opposite to the model in order for both to read naturally.

The period at the end is similarly assembled using Method Chaining calls.

class Schedule...
    public Schedule From(Month m) {
      Debug.Assert(null == periodStart);
      periodStart = m;
      return this;
    }
    public Schedule Till(Month m) {
      Debug.Assert(null != periodStart);
      PeriodInYear period = new PeriodInYear(periodStart.Number, m.Number);
      content = new AndSpecification<DateTime>(content, period);
      return this;
    } 

Here I use a Context Variable to properly build up the period.

This example uses simple static methods for the Nested Functions, would it benefit by getting rid of the class names? I think it would read better to say "Monday" rather than "DayOfWeek.Monday". Object Scoping would provide this at the cost of requiring the inheritance relationship. In java I could use static imports. The gain isn't huge but would probably be worthwhile.

Significant Revisions

15 Jan 08: First stub