|
When I first started programming in Smalltalk one of the things I
liked right from the start were the collection classes. They allowed
you to simply do a bunch of common and powerful operations on
collection classes. When Java appeared, I missed these kinds of
methods - the Java (and C#) collections were very limited compared
to Smalltalk. The main reason for this limitation is that Java doesn't have
any convenient implementation for a Closure. The powerful
Smalltalk methods for collections all relied on closures. In recent years I've done a lot of Ruby programming. One of the
main things that got me hooked into Ruby was the fact that it has
these powerful collection methods, which Ruby can have since ruby
does have closures in its language. So what are these closure using methods that are so good for
collections? The centerpiece method is 'each' which in design patterns
terminology is an internal iterator. (In Smalltalk this method is
called 'do'.)
employees.each do |e|
e.doSomething
end
The each method takes a one argument block (Ruby and Smalltalk
both refer to closures as blocks). It then executes
the block on each element in the collection. It essentially is the
same as the foreach statement you find in many modern languages (and
recently arrived in Java with 1.5). With these languages the foreach
method is all you get, but with collections and closures the each
method is just the start. A common thing you need to do with collections is to find all the
elements of a collection that satisfy some boolean condition. For
this you write some code like this.
managers = []
for e in employees
if e.manager?
managers << e
end
endAlthough this is legal ruby (<< adds to a collection) no
decent rubyist would write this. Instead they would write the
following. managers = employees.select {|e| e.manager?}Like each, select takes a block as an argument. In this case the
block is delimited by curlies rather than do/end (both are legal but
curlies are better for one-liners). It applies the block to each
element of the list, if the block returns true it puts that element
into the result collection which it returns at the end. As you can
see having this kind of method can really simplify this common
case. Smalltalk also called this method 'select', ruby has an alias
for this method called 'find_all'. There's also a sibling method
called 'reject' which returns all the elements for which the block
returns false. The next most common closure method I use with collections is
collect. This is similar but where you need to gather the results of
a method call. Here's the traditional code:
offices = []
for e in employees
offices << e.office
end
Again the closure method allows you to use a one-liner. offices = employees.collect {|e| e.office}You can see what this does, it's similar to select but instead
puts the result of the method call into the returned
collection. Smalltalk also called this 'collect'. Lisp has a similar
function called 'map', in ruby 'map' is an alias for 'collect'. There's a concept that's come out of modern functional
programming languages that's similar to the two preceding closure
methods - it's called a list comprehension. List comprehensions have
made
their way into the python language. They provide a syntactic
approach to getting the kinds of benefits we've seen so far. Here are
the two examples again using python list comprehensions.
managers = [e for e in employees if e.isManager]
offices = [e.office for e in employees]
List comprehensions make it easy to combine the two. managersOffices = [e.office for e in employees if e.isManager] You can also do this be chaining block methods together.
managersOffices = employees.select{|e| e.manager?}.
map {|m| m.office}
List comprehensions are nice, but they really only handle
select/collect cases. Blocks allow much more, here are some more
things you can do with collections and blocks. Similar to select are tests to see if all the elements of a
collection match a condition or if any of them do.
allManagers = employees.all? {|e| e.manager?}
noManagers = ! employees.any? {|e| e.manager?}
The partition method combines select and reject. It works well
with the multiple assignment feature in ruby. managers, plebs = employees.partition{|e| e.manager?}You don't just need cases with single argument blocks. Another
one I use a lot is the sort method, which (in its classic form) takes two arguments to
sort a list. sortedEmployees = employees.sort {|a,b| a.lastname <=> b.lastname}The sort method returns a list sorted using the code that's in
the block. The <=> operator is the comparison
operator, generally known as the starship operator. It returns -1
if a is less than b, +1 if greater, and 0 if the same. Ruby 1.8 allows me to sort more easily with a single
argument block
sortedEmployees = employees.sort_by {|e| e.lastname}
Another two argument method is each_with_index which iterates through
a list in the same way as each but passes the index as well as the
element to the block. Find (aliased to the smalltalk name 'detect') looks for the first
one that matches a condition. volunteer = employees.find {|e| e.steppedForward?}Often with something like find, you want to do something if no
element matches the condition. Find will return nil if nothing
matches, so you can test the result. However you can also pass a
second block which is used if nothing matches. volunteer = employees.find(lambda{self.pickVictim}) {|e| e.steppedForward?}Ruby's syntax is nice for methods that take a single block as an
argument, but I find it rather more clunky for multiple
blocks. Smalltalk (according to my questionable memory) would look
like this.
volunteer := employees
detect: [:each| each hasSteppedForward]
ifNone: [self pickVictim]
As usual Smalltalk's keyword parameters make it much easier to
read a multi-argument method. The last closure method I'll mention is one that people often
find hard to understand: inject. Inject is good when you want a
cumulative result on a collection. Let's say we want the total of all
our employees' salaries. The traditional way would be like this:
total = 0
for e in employees
total += e.salary
end
Inject does it like this. total = employees.inject(0) {|result, e| result + e.salary}At each element in the collection inject assigns the result of
executing the block to the result variable. The result of the final
execution gets returned from inject. An important point about collection closure methods is how they
can be composed easily to create complex expressions. Imagine you
want to find your oldest programmer in a product's team based in a country.
aProduct.teams.
select {|t| t.country == aCountry}.
map{|t| t.members}.
flatten.
select{|e| e.isProgrammer}.
max{|a,b| a.age <=> b.age}
Now there's a serious question here about a method that
complex. (And I must admit I used to be wary of chaining like this.)
But it's a good illustration of how useful it is to string these
methods together. (And in a language that supports the right kind of
concurrency, it has important implications for multi-processor
performance.) My purpose here isn't to say how wonderful Ruby (and Smalltalk,
Lisp etc) are because they have these methods. My point is that the
combination of closures and collections leads to some very nice
things. Languages that don't have closures really miss out on these
things. If you do get the chance to program in a closureable language,
get used to using these kinds of methods on your collections. I find
they make a big difference. (Thanks to Masanori Kado, Rik Hemsley, Christian Neukirchen and
Stanislav Karchebny for helping me fix a couple of errors with the
first posting of this article)
|