CollectionLambda

language feature · ruby

tags:

When I first started programming in Smalltalk one of the things I liked right from the start were the collection classes. They allowed you to simply do a bunch of common and powerful operations on collection classes. When Java appeared, I missed these kinds of methods - the Java (and C#) collections were very limited compared to Smalltalk. The main reason for this limitation is that Java doesn't have any convenient implementation for a Lambda. The powerful Smalltalk methods for collections all relied on lambdas. [1]

In recent years I've done a lot of Ruby programming. One of the main things that got me hooked into Ruby was the fact that it has these powerful collection methods, which Ruby can have since ruby does have lambdas in its language.

So what are these lambda-using methods that are so good for collections? The centerpiece method is 'each' which in design patterns terminology is an internal iterator. (In Smalltalk this method is called 'do'.)

  employees.each do |e|
    e.doSomething
  end

The each method takes a one argument block (Ruby and Smalltalk both refer to lambdas as blocks). It then executes the block on each element in the collection. It essentially is the same as the foreach statement you find in many modern languages (and arrived in Java with 1.5). With these languages the foreach method is all you get, but with collections lambdas the each method is just the start.

A common thing you need to do with collections is to find all the elements of a collection that satisfy some boolean condition. For this you write some code like this.

managers = []
for e in employees
  if e.manager?
    managers << e 
  end
end

Although this is legal ruby (<< adds to a collection) no decent rubyist would write this. Instead they would write the following.

managers = employees.select {|e| e.manager?}

Like each, select takes a block as an argument. In this case the block is delimited by curlies rather than do/end (both are legal but curlies are better for one-liners). It applies the block to each element of the list, if the block returns true it puts that element into the result collection which it returns at the end. As you can see having this kind of method can really simplify this common case. Smalltalk also called this method 'select', ruby has an alias for this method called 'find_all'. There's also a sibling method called 'reject' which returns all the elements for which the block returns false.

A common feature of these collection lambdas is that they take a lambda as an argument to a method. In functional programming circles these beasts are usually referred to as Higher-Order Functions. A Higher-Order Function is a function that takes a function as an argument or returns a function. In OO circles we talk about "methods" but I've not heard anything called a higher-order method.

The next most common collection lambda I use is map. This is similar but where you need to gather the results of a method call. Here's the traditional code:

  offices = []
  for e in employees
   offices << e.office
  end
  

Again the lambda allows you to use a one-liner.

offices = employees.map {|e| e.office}

You can see what this does, it's similar to select but instead puts the result of the method call into the returned collection. Smalltalk called this method 'collect' and ruby has 'collect' as an alias for map.

There's a concept that's come out of modern functional programming languages that's similar to the two preceding collection lambdas - it's called a list comprehension. List comprehensions have made their way into the python language. They provide a syntactic approach to getting the kinds of benefits we've seen so far. Here are the two examples again using python list comprehensions.

  managers = [e for e in employees if e.isManager]
  offices = [e.office for e in employees]
  

List comprehensions make it easy to combine the two.

managersOffices = [e.office for e in employees if e.isManager]

You can also do this by chaining block methods together.

managersOffices = employees.select{|e| e.manager?}.
                            map   {|m| m.office}

List comprehensions are nice, but they really only handle select/map cases. Lambdas allow much more, here are some more things you can do with collection lambdas.

Similar to select are tests to see if all the elements of a collection match a condition or if any of them do.

   allManagers = employees.all? {|e| e.manager?}
   noManagers = ! employees.any? {|e| e.manager?}
  

The partition method combines select and reject. It works well with the multiple assignment feature in ruby.

managers, plebs = employees.partition{|e| e.manager?}

You don't just need cases with single argument blocks. Another one I use a lot is the sort method, which (in its classic form) takes two arguments to sort a list.

sortedEmployees = employees.sort {|a,b| a.lastname <=> b.lastname}

The sort method returns a list sorted using the code that's in the block. The <=> operator is the comparison operator, generally known as the starship operator. It returns -1 if a is less than b, +1 if greater, and 0 if the same.

Ruby 1.8 allows me to sort more easily with a single argument block

sortedEmployees = employees.sort_by {|e| e.lastname}

Another two argument method is each_with_index which iterates through a list in the same way as each but passes the index as well as the element to the block.

Find (aliased to the smalltalk name 'detect') looks for the first one that matches a condition.

volunteer = employees.find {|e| e.steppedForward?}

Often with something like find, you want to do something if no element matches the condition. Find will return nil if nothing matches, so you can test the result. However you can also pass a second block which is used if nothing matches.

volunteer = employees.find(lambda{self.pickVictim}) {|e| e.steppedForward?}

Ruby's syntax is nice for methods that take a single block as an argument, but I find it rather more clunky for multiple blocks. Smalltalk (according to my questionable memory) would look like this.

volunteer := employees 
               detect: [:each| each hasSteppedForward]
               ifNone: [self pickVictim]

As usual Smalltalk's keyword parameters make it much easier to read a multi-argument method.

The last lambda I'll mention is one that people often find hard to understand: inject (also known as reduce and fold). Inject is good when you want a cumulative result on a collection. Let's say we want the total of all our employees' salaries. The traditional way would be like this:

  total = 0
  for e in employees
    total += e.salary
  end

Inject does it like this.

total = employees.inject(0) {|result, e| result + e.salary}

At each element in the collection inject assigns the result of executing the block to the result variable. The result of the final execution gets returned from inject.

An important point about collection lambdas is how they can be composed easily to create complex expressions. Imagine you want to find your oldest programmer in a product's team based in a country.

aProduct.teams.
    select {|t| t.country == aCountry}.
    map{|t| t.members}.
    flatten.
    select{|e| e.isProgrammer}.
    max{|a,b| a.age <=> b.age}

Now there's a serious question here about a method that complex. (And I must admit I used to be wary of chaining like this.) But it's a good illustration of how useful it is to string these methods together. (And in a language that supports the right kind of concurrency, it has important implications for multi-processor performance.)

My purpose here isn't to say how wonderful Ruby (and Smalltalk, Lisp etc) are because they have these methods. My point is that the combination of lambdas and collections leads to some very nice things. Languages that don't have lambdas really miss out. If you do get the chance to program in a language with lambdas get used to using these kinds of methods on your collections. I find they make a big difference.

(Thanks to Masanori Kado, Rik Hemsley, Christian Neukirchen and Stanislav Karchebny for helping me fix a couple of errors with the first posting of this article)

Notes

1: When I first published this bliki entry in 2005 I used the term "Closure" to refer to lambdas. At the time "closure" was often used in this way. Since then the term "lambda" became more popular, so I changed this bliki entry to follow the usage.