This page describes an operation in the collection pipeline pattern. For more context read:
- Collection Pipeline Article: An article explaining the collection pipeline pattern
- Operation Catalog: The list of selected operations that I have these pages for.
sort
Output is sorted copy of input based on supplied comparator
The simplest application of the sort operator is a simple call with no arguments.
[3, 1, 4, 2].sort # => [1, 2, 3, 4]
(sort [3 1 4 2]) ;; => (1 2 3 4)
A sort without any arguments like this works for things that know how to be sorted, and comes up with a pre-defined sort order, in these cases ascending.
The simplest way of changing the sort order is to compose the sort with reverse.
[3, 1, 4, 2].sort.reverse # => [4, 3, 2, 1]
(reverse(sort [3 1 4 2])) ;; => (4 3 2 1)
But often there's either no standard way of sorting what you need, or you want to sort in a different way to the standard. The most basic way to do this is to pass a comparator function into the sort. A comparator function takes two arguments and indicates the ordering.
["10", "8", "300"].sort {|a,b| a.to_i <=> b.to_i} # => ["8", "10", "300"]
(sort #(< (read-string %1) (read-string %2)) ["8" "10" "300"]) ;; => ("8" "10" "300")
There's an obvious bit of duplication here. Almost always when you want to sort something, you call the same function on both arguments in order to get the sort key. In the above example this function is the string-to-number conversion. So it makes sense to specify that function just once.
["10", "8", "300"].sort_by {|a| a.to_i} # => ["8", "10", "300"]
(sort-by read-string ["8" "10" "300"]) ;; => ("8" "10" "300")
To think a bit more deeply about this, I consider there are two functions that are needed when specifying a sort. One function is the comparator that compares two values and yields an indication of the sort order, the other function is a key-extraction function that runs on each element of the collection to be sorted and extracts a value that is passed to the comparator.
There are two styles of comparator. The first is the boolean comparator, such as “<” which tells you if the first argument is less-than the second. To get an ordering, such a comparator may have to be used in both directions for each comparison so you can tell if the relationship between the tested pair of values is less-than, greater-than, or equals. (This is why you mustn't use “<=”.)
The second form of comparator returns an integer where negative means less-than, zero means equal, and positive means greater-than. The “starship” operator “<=>” used by many languages is a good example of this. This three-valued comparator only needs to be used once for every tested pair of values.
Most platforms will have a default comparator that works on various fundamental types, such as numbers and strings.
What if you want to sort something else, such a T-shirt sizes of “S”, “M”, “L”, and “XL”? You have two routes, either implement the default comparator on the T-shirt-size type, or use a key-extraction function that will turn the T-shirt-size into something that can be compared (such as a number).
When you need to sort by multiple keys, the crude way to do it is incorporate all the key comparisons into a single comparator lambda.
ruby…["J.C. Bach", "J.S Bach", "C.P.E. Bach", "T Albinoni"].sort do |a,b| first = a.split.last <=> b.split.last (0 != first) ? first : a <=> b end # => ["T Albinoni", "C.P.E. Bach", "J.C. Bach", "J.S Bach"]
But this is obviously a pain in the neck. A better approach is to pass a list of key-extraction functions to the sort method and it can try them in turn.
["J.C. Bach", "J.S Bach", "C.P.E. Bach", "T Albinoni"]. sort_by{|s| [s.split.last, s]} # => ["T Albinoni", "C.P.E. Bach", "J.C. Bach", "J.S Bach"]
(sort-by (juxt #(last (clojure.string/split % #"\s+")) identity) ["J.C. Bach" "J.S Bach" "C.P.E. Bach" "T Albinoni"]) ;; => ("T Albinoni" "C.P.E. Bach" "J.C. Bach" "J.S Bach")