WORK-IN-PROGRESS: - this material is still under development
Make modifier methods return the host object so that multiple modifiers can be invoked in a single expression.
computer()
.processor()
.cores(2)
.i386()
.disk()
.size(150)
.disk()
.size(75)
.speed(7200)
.sata()
.end();
Internal DSLs are all about providing a flowing API, which often involves a sequence of calls on a single object - usually and Expression Builder.
Method Chaining is an idiom that acheives this through a sequence of modifier calls where each call returns the host object for further modification.
[TBD: How to do interplay with nested functions]Let's say I have a hard disk object and I wish to set its capacity and speed. With a usual object API I would do it with something like this.
//java...
HardDrive hd = new HardDrive();
hd.setCapacity(150);
hd.setExternal(true);
hd.setSpeed(7200);
I create my object, put it in a variable, and then use setters to manipulate its properties. For just two items like this, I'd be more likely to use a constructor, but let's assume there's many of them. DSLs are often about building up configurations of objects, and doing so in constructors is often tricky. It's also usually difficult to read since constructors often allow only positional parameters.
Using Method Chaining I would do something like this:
new HardDrive().capacity(150).external().speed(7200);
The basic idea is that each setter returns the hard disk object so we can chain multiple setter calls into a single expression.
Implementing this is straigtforward. In Java we usually implement a setting method like this:
public void setSpeed(int arg) {
this.speed = arg;
}
This follows a common rule of API design called Command Query Separation. This rule was popularized by Bertrand Meyer. The idea is to clearly indicate which methods of an API modify the receiving object as opposed to those without side effects. The rule is that modifier methods ('commands' as he called them) should not return a value. That way any query, which does return a value, can be assumed to not have any observable side effects. This is a very valuable property since methods without side effects can be moved around and used more widely than those that make changes.
Command Query Separation is a very good API design principle, one that I urge you follow most of the time. However a fluent interface often reads much better if you break this rule by using Method Chaining, which modifies the setter to look like this.
private HardDrive speed(int arg) {
speed = arg;
return this;
}
There are two changes to the setter. Firstly I alter name to something that makes sense within the context of the fluent expression. As a result the name no longer makes it clear that it's a setter - indeed the name looks more like a query method. This naming is very problematic as it will seriously confuse anyone who is expecting a regular API. Secondly the chaining setter returns the hard drive object itself, so we can chain further calls.
As you can see methods defined for Method Chaining violate basic rules for API design, rules that are well understood and valuable. For this reason I strongly reccomend that you confine these methods to Expression Builders. If you only use them on clearly marked Expression Builders there's less chance that you will get the dangerous confusions between API conventions.
Not just does Method Chaining change the rules for API design, it also implies a change to formatting conventions. Usually we try to keep multiple method calls on a single line, however long Method Chaining often does not look good that way, particularly if we want to suggest a hierarchy. As a result it's often better to format Method Chaining with each call on its own line.
new HardDrive()
.capacity(150)
.external()
.speed(7200);
Java and C# ignore most newlines, so this gives us a lot of flexibility in formatting. There is a general preference to have the periods at the start of the line, as this makes them more noticeable and thus emphasizes the use of chaining. Languages that use newlines as statement separators are less flexible here. Ruby, for example, can work but you need to have the periods at the end of the line rather than the begining. Putting methods on separate lines also makes debugging easier as error messages and debugger control is usually on a line by line basis. As a result it's wise to do less on each line.
If you have a hierarchic structure, you'll need to use Context Variables to keep track of what object is being modified. This will make the Expression Builder more complicated (the examples in Expression Builder illustrate this).
Method Chaining can have a problem with knowing when to stop. In some situations there is an action to be done at the end of the expression, but you don't know which method is the last one. Consider adding appointments to a calendar object like this.
// C#...
Calendar cal = new Calendar();
cal.Add("dentist")
.From(1600)
.To(1700)
.At("123 Main St");
cal.Add("birthday dinner")
.From(1800)
.To(2100);
In classic chaining, add should return an
AppointmentBuilder object with from, to,
and at chaining and returning the builder. The
question, however is what triggers adding the actual appointment
method to the calendar? In some cases we may be able to add an
empty appointment with the add method, but circumstances force us
to add a fully formed object at the end? In this case we only want
to create the appointment on the last chained call. The trouble is
how do we tell when the last call occurs when we're in it?
Another case where the finishing problem gets nasty is when we have nested components, particularly where the components are similar. Imagine we are specifying a parts breakdown.
item("Pre-Wibbler).contains()
.item("screw").times(4)
.item("Fuzz Box").contains()
.item("Furry Ear")
.item("chewing gum").times(2)
.done()
.item("Dischatter")
.item("Centrifugal Nail").times(3)
.done()
.item("rubber band").times(3)
...
The issue here is that you need to know when you've finished listing the contents of the fuzz box so you can go on to other items in the pre-wibbler. If the child in the hierarchy is a different thing to the parent, then you can infer this by having the child builder implement the parent builder method - closing the child and returning to the parent. However if you have parents and children that are the same (or indeed even similar enough) then that's not going to work.
One way to do this, as above, is to add some kind of end marker, such as a done method. This adds noise to the DSL. It also means the builder needs to understand its context - enough to know that it should add the newly created appointment to the calendar at the end.
If you have multiple adds within a single expression, you can include the stopping behavior with each new add. The add method looks at a Context Variable. If the Context Variable is non-null it adds the item in the Context Variable to the result. It then starts working on a new value in the Context Variable. This approach helps, but you still need a stop at the end of the sequence of adds.
This problem usually makes Method Chaining more trouble than it's worth. Most of the time the better option is to use a Nested Function.
cal.Add(Appointment.Build("dentist")
.From(1600)
.To(1700)
.At("123 Main St"));
cal.Add(Appointment.Build("birthday dinner")
.From(1800)
.To(2100));
Method Chaining is a valuable technique, but it's best used in combination with others.
[TBD: Discuss use of chaining on builder vs chaining on value objects]A valuable variation to the basic Method Chaining approach is to use multiple interfaces to drive a fixed sequence of method chaining calls. Let's consider building up an email message. We want the programmer to first specify who it's to, any cc's, the subject, and then the body. We can do this by presenting a sequence of interfaces to the Expression Builder. The first interface returns an interface with only the to method. The to method returns an interface with only the legal next steps: to, cc, and subject. The cc method returns an interface with only cc and subject. The subject method returns an interface with only the body method.
[TBD: Add diagram to illustrate]Method Chaining can add a great deal to the readability of an internal DSL and as a result has become almost a synonum for internal DSLs in some minds. Method Chaining is best, however, when it's used in conjunction with other function combinations.
Method Chaining is particularly effective with grammars like
parent::= (this | that)*. The use of different
methods provides readable way of seeing which argument is coming
next. Similarly optional arguments can be easily skipped over with
Method Chaining. A list of mandatory clauses, such as parent::=
first second doesn't work so well with the basic form,
although it can be supported well by using progressive
interfaces. Most of the time I'd prefer Nested Function for that case.
The biggest problem for Method Chaining is the finishing problem. While there are workarounds, usually if you run into this you're better off usng a Nested Function. Nested Function is also a better choice if you are getting into a mess with Context Variables.
Here's the basic computer configuration example done with a healthy dose of Method Chaining.
computer()
.processor()
.cores(2)
.i386()
.disk()
.size(150)
.disk()
.size(75)
.speed(7200)
.sata()
.end();
To start an expression using Method Chaining you need some method call to initiate the chain. In this case I'm using a staic method that I can reference in the DSL script by using a static import.
public static ComputerBuilder computer() {
return new ComputerBuilder();
}
I use the computer builder to define the various methods I need for chaining. It also contains the parse data.
For the processor, I store current processor in a Context Variable and manipulate it using replacement.
class ComputerBuilder...
public ComputerBuilder processor() {
currentProcessor = new Processor(1, null);
return this;
}
private Processor currentProcessor;
public ComputerBuilder cores(int arg) {
currentProcessor = new Processor(arg, currentProcessor.getType());
return this;
}
public ComputerBuilder i386() {
currentProcessor = new Processor(currentProcessor.getCores(), Processor.Type.i386);
return this;
}
As is characteristic for Method Chaining the builder returns itself with each call in order to continue the chain.
Specifying the disks is a bit more involved since each disk has its own data. I could define more context variables on the computer builder, just as I did for processor, but in this case I'll use a separate builder to capture the attributes for the disk.
class DiskBuilder...
public DiskBuilder size(int arg) {
disk = new Disk(arg, disk.getSpeed(), disk.getIface());
return this;
}
public DiskBuilder speed(int arg) {
disk = new Disk(disk.getSize(), arg, disk.getIface());
return this;
}
public DiskBuilder sata() {
disk = new Disk(disk.getSize(), disk.getSpeed(), Disk.Interface.SATA);
return this;
}
private Disk disk = new Disk(Disk.UNKNOWN_SIZE, Disk.UNKNOWN_SIZE, null);
The tricky bit here is shuffling between the computer builder and
the disk builder and keeping the Context Variables in step. The disk clause introduces a new
disk, so the computer builder puts a new disk builder into a context
variable and passes the call to it.
class ComputerBuilder...
public DiskBuilder disk() {
if (currentDisk != null) loadedDisks.add(currentDisk.getDisk());
currentDisk = new DiskBuilder(this);
return currentDisk;
}
private DiskBuilder currentDisk;
private List<Disk> loadedDisks = new ArrayList<Disk>();
class DiskBuilder...
public DiskBuilder(ComputerBuilder parent) {
this.parent = parent;
}
private ComputerBuilder parent;
The disk clause also occurs between disks. As a result
I add the current disk to a list of loaded disks before making a new
builder. The disk builder will get the disk call if I'm in the middle
of making one, so I just forward the call to the computer builder.
class DiskBuilder...
public DiskBuilder disk() {
return parent.disk();
}
In this example, I have to deal with the finishing problem. I've done the simplest work-around here and used an end method. As with the disk clause, the end method can appear as a call to disk builder, so I forward it to the computer builder when that happens.
class DiskBuilder...
public Computer end() {
return parent.end();
}
In the computer builder I use the end method to create and return the computer that's been configured.
class ComputerBuilder...
public Computer end() {
return new Computer(currentProcessor, disks());
}
private Disk[] disks() {
List<Disk> result = new ArrayList<Disk>();
result.addAll(loadedDisks);
if (currentDisk != null) result.add(currentDisk.getDisk());
return result.toArray(new Disk[result.size()]);
}
The example illustrates quite well many of the issues in using Method Chaining, particularly compared to Nested Function. Method Chaining reads very clearly, without much of of syntactic noise that can clutter Nested Function. However to pull it off there's a lot of fiddling around with Context Variables and coping with the finishing problem.
C# and Java are similar languages, so much of the comments that apply to Java apply to C# too. The biggest difference is that C# has a special property syntax, rather than Java's more fumbly getters and setters. As a result the regular example would look like this:
HardDrive hd = new HardDrive();
hd.Capacity = 150;
hd.IsExternal = true;
hd.Speed = 7200;
The chaining case looks almost the same.
new HardDrive()
.Capacity(150)
.External
.Speed(7200);
The chaining modifiers for speed and capacity are identical (other than the capitalization convention). There is, however, one interesting variation in handling the external property. By using a property getter for external, I can get rid of the uneccessary and annoying parenthesis. I implement the property getter like this.
private HardDrive External {
get {
_isExternal = true;
return this;
}
}
This code should make you feel distinctly uneasy. A property getter that's really acting as a setter, that returns the object itself rather than the value of the property. This violates all our expectations of how property getters should work. In almost all circumstances I would call this extremely bad code. It's only acceptable when clearly placed in a fluent context - again I would confine this abomination to a securely fenced Expression Builder.
Code completion (aka IntelliSense) is one of the joys of modern IDEs. I no longer need to remember what methods are called on a particular class, I can just hit a key combination and I get a menu right there. Since my brain filled up about fifteen years ago, I appreciate having to remember less.
Many DSLs have a definite order in which things can be built up. We can use code completion to help signal that by using progressive interfaces. Say we want to build up an email message.
message = MessageBuilder.Build()
.To("fowler@acm.org")
.Cc("editor@publisher.com")
.Subject("error in book")
.Body("Sally Shipton should read Sally Sparrow");
We want to ensure that we build up the elements of the message in a particular order: first the tos, then the ccs, then the subject and finally the body. With vanilla Method Chaining there's nothing to enforce a particular order.
The chocolate sauce in this case is to use multiple interfaces over the
Expression Builder. I'll start at the begining with
build.
public static IMessageBuilderPostBuild Build() {
return new MessageBuilder();
}
interface IMessageBuilderPostBuild {
IMessageBuilderPostTo To(String arg);
}
I return an Expression Builder just as I would normally, but the return type is a special interface that only allows the legal next step in the seqeunce. The Expression Builder implements that interface and now I can only make that call next. As an added bonus my code completion menus can now only show me the legal next steps (although it's not perfect as methods inherited from Object also show up). So code completion can guide me through the process.
The next step continues the story.
public IMessageBuilderPostTo To(String arg) {
Content.To.Add(new Email(arg));
return this;
}
interface IMessageBuilderPostTo : IMessageBuilderPostBuild {
IMessageBuilderPostCc Cc(String arg);
IMessageBuilderPostSubject Subject(String arg);
}
One new thing on this step is the legal next steps after To
include the legal steps after Build. I can show this,
without duplicating the body of
IMessageBuilderPostBuild by using inheritance between
the interfaces. It's not really that worthwhile in this example,
but it's often a useful technique.
The rest of the sequence continues add you'd expect.
public IMessageBuilderPostCc Cc(String arg) {
Content.Cc.Add(new Email(arg));
return this;
}
public IMessageBuilderPostSubject Subject(String arg) {
Content.Subject = arg;
return this;
}
public Message Body(String arg) {
Content.Body = arg;
return Content;
}
interface IMessageBuilderPostCc
{
IMessageBuilderPostCc Cc(String arg);
IMessageBuilderPostSubject Subject(String arg);
}
interface IMessageBuilderPostSubject {
Message Body(String arg);
}
I have a natural stop method with Body, so I'll
have that return the message.
I'll explore the stopping problem a bit more. I stole this target DSL code while working on a tutorial with Neal Ford
Calendar cal = new Calendar();
cal.Add("dentist")
.From(1600)
.To(1700)
.At("123 Main St");
cal.Add("birthday dinner")
.From(1800)
.To(2100);
Our appointment class has immutable properties that must be set in the constructor.
class Appointment...
public string location { get { return _location; } }
public int start { get { return _start; } }
public int end { get { return _end; } }
public string description { get { return _description; } }
public Appointment(string description, int start, int end, string location) {
this._start = start;
this._end = end;
this._description = description;
this._location = location;
}
Such a domain object is often a good design. When you can use an immutable object it's usually a good idea to do so. I also prefer setting properties in a constructor, particularly mandatory properties. However these good API practices can make it harder to use an Expression Builder. If we could create an empty appointment and populate it with setters, it would make the Expression Builder much easier in this case. But what if we don't want to weaken our domain model design?
I can cope with part of this problem by holding data in the builder until I need it.
class Appointment...
private int? _from;
public AppointmentBuilder From(int arg) {
checkStillBuilding();
_from = arg;
return this;
}
The real problem however is when to create the underlying appointment and add it to the calendar. One solution is to add a stop method to the DSL.
Calendar cal = new Calendar();
cal.Add("dentist")
.From(1600)
.To(1700)
.At("123 Main St")
.Done();
cal.Add("birthday dinner")
.From(1800)
.To(2100)
.Done();
The full stop method (Done) gives me a spot to do
the necessary completion work. To add the new appointment into the
Calendar, I do need to know which calendar I'm dealing with, so
the calendar passes itself to the builder during construction.
class Calendar...
public AppointmentBuilder Add(String description) {
return new AppointmentBuilder(description, this);
}
class Appointment...
private Calendar _host;
public AppointmentBuilder(string description, Calendar host) {
this._description = description;
this._host = host;
}
Then the done method can create the appointment and add it to the calendar.
class Appointment...
public void Done() {
buildContent();
_host.AddAppointment(_content);
}
private void buildContent() {
checkNotNull(_from, "from time");
checkNotNull(_to, "to time");
checkNotNull(_description, "description");
_content = new Appointment(_description, (int) _from, (int) _to, _location);
}
private void checkNotNull(object arg, string message) {
if (null == arg) throw new InvalidOperationException(message + " should not be null");
}
This kind of builder has two logical states - building up the data and being done. I use a checking method to ensure that things are being done in the right sequence.
class Appointment...
private void checkStillBuilding() {
if (null != _content)
throw new InvalidOperationException("Builder has already completed");
}
Using a stop method like this has two main problems: having
Done() be part of the DSL is noise, and the builder
can only be used to add appointments to calendars, it can't be
used in other contexts.
I can tackle the second problem by using a closure. Rather than have the calendar pass itself into the builder, it can pass in a delegate to tell the builder what to do at the end. To create the builder, I use this code.
class Calendar...
public ClosureAppointmentBuilder Add(String description) {
Action<ClosureAppointmentBuilder> doneAction =
delegate(ClosureAppointmentBuilder builder) { AddAppointment(builder.Content); };
return new ClosureAppointmentBuilder(description, doneAction);
}
I then use a constructor that takes the delegate and call it in the stop method.
class ClosureAppointmentBuilder...
public ClosureAppointmentBuilder(string description, Action<ClosureAppointmentBuilder> doneAction)
{
this.description = description;
this.doneAction = doneAction;
}
public void Done() {
buildContent();
doneAction.Invoke(this);
}
This way I can use the builder in any context and let its user decide on any behavior to run at the end.
Of course that still leaves me with the noise of the stop method in the first place. I can avoid having a stop method by using a top-level call instead.
cal.Add(Appointment.Build("dentist")
.From(1600)
.To(1700)
.At("123 Main St"));
cal.Add(Appointment.Build("birthday dinner")
.From(1800)
.To(2100));
This way the calendar's add method controls the stop point and adds the new appointment to itself.
public void Add(AppointmentBuilder builder) {
appointments.Add(builder.Content);
}
In this case I would have the
Content property calculate a new appointment each
time it's called and remove the content field. This would allow me to get rid of the checkStillBuilding
method. Since the builder is no longer doing a wider action on
completion, there's no need for it to ensure it only creates a
single appointment at the end. Overall I prefer using a top-level
function because it allows the builder to concentrate on building
its content object and leaves the responsibility of what to do
with that object to its caller.