The Ruby Enumerable <=> operator and #max vs. #max_by

sylviapap

Sylvia Pap

Posted on February 12, 2020

The Ruby Enumerable <=> operator and #max vs. #max_by

One of my favorite things about learning to code is that any time you think, could this be shorter? You're probably right. A nice example of this that I encountered recently is the difference between .max and .max_by when finding the maximum value(s) in a Ruby array.

Should I use .max or .max_by?

Alt text of image

When I first saw the ruby array methods .max and .max_by, I wondered why we needed both, or why we couldn't just always use .max, since it's shorter and I don't want to be wasting precious seconds typing out an extra 3 characters if I don't need to. Spoiler alert - in many cases, using .max_by ends up making your code overall shorter and cleaner! I will discuss the different ways to use each here, but I also want to note that the same logic applies for .min / .min_by, .minmax / .minmax_by, and .sort / .sort_by. If you want to know some more terms - all of these methods are connected by the Ruby Enumerable class, and more specifically, the Comparable class. Again, I focus here on .max, but really all of these methods come down to the fundamental principle of sorting!

Alt text of image

The "spaceship" operator: <=>

All of these methods (min, max, sort) use the <=> operator. The <=>, or "spaceship" combines conventional comparison operators (<, <=, ==, >=, and >):



a <=> b
  if a < b then return -1
  if a = b then return  0
  if a > b then return  1
  if a and b are not comparable then return nil


Enter fullscreen mode Exit fullscreen mode

and is then used in an object, such as an array, to order its elements. I like to think of an array of numbers, say [4, 2, 7, 1], and imagine the computer asking, is 4 < 2? No. Is 4 > 2? Yes. Move it up in the list. And so on with 4 < 7, etc.

It's a bit different with strings:



"hello" <=> "world"                    #=> -1


Enter fullscreen mode Exit fullscreen mode

Some of the rules with <=> and strings are intuitive. As you might expect, you cannot <=> compare a string to an integer/float. But! You can compare a string with a string of an integer.



"abcdef" <=> 1                        #=> nil
"abcdef" <=> "1"                      #=> 1


Enter fullscreen mode Exit fullscreen mode

Second, if the strings are identical, we get '0' for equal.



"abcdef" <=> "abcdef"                   #=> 0


Enter fullscreen mode Exit fullscreen mode

But then it gets a little tricky. You might be thinking, a string is 'less than' or 'equal to' another based on their lengths. And you would be partially correct.

Pretty much all that the official Ruby documentation seems to give on this point is "if the strings are of different lengths, and the strings are equal when compared up to the shortest length, then the longer string is considered greater than the shorter one."



"abcdef" <=> "abcde"                    #=> 1
"abcdef" <=> "abcdefg"                  #=> -1


Enter fullscreen mode Exit fullscreen mode

But! This can be misleading, and for me it makes more sense to think in terms of alphabetical sorting:



"abcdef" <=> "ABCDEF"                   #=> 1 
"horse" <=> "apple                      #=> 1
"a" <=> "z"                             #=> -1
"categorically" <=> "category"          #=> -1


Enter fullscreen mode Exit fullscreen mode

I love the idea that everything can be represented by a number. And that idea is very important here, because the <=> operator with strings is actually comparing characters in binary. So an a is 01100001, or 97 in decimal, A is 01000001, or 65, which explains why an identical, but lowercase, string would be considered 'greater than' its capitalized version. And an a is less than z because each letter increases by one throughout the alphabet - z in binary is 01111010 or 122. Or, in more human terms - they are simply sorted in alphabetical order...



"z" <=> "apple"                         #=> 1


Enter fullscreen mode Exit fullscreen mode

...and methods such as .max return the last string because "maximum" is the numerical way to think of a string closest to the end of the alphabet.



a = %w(dog albatross horse)
a.sort                                 #=> ["albatross", "dog", "horse"] 
a.max                                  #=> "horse"


Enter fullscreen mode Exit fullscreen mode

While this is all very interesting background information on how these methods work, it still seems to me that the <=> operator and Ruby's Comparable mixin are more useful in cases such as .max, because we are more directly interacting with numbers. Sure, I just said everything is a number, but finding the maximum from a list of numbers seems more common to me than finding the maximum string.

When to use .max



[5, 1, 3, 4, 2].max                    #=> 5
[5, 1, 3, 4, 2].max(3)                 #=> [5, 4, 3]


Enter fullscreen mode Exit fullscreen mode

.max is useful and concise if you just want to find the maximum value(s) from a list of numbers, such as an array, or a range:



(10..20).max                           #=> 20


Enter fullscreen mode Exit fullscreen mode

So if you just have that list of numbers, and you want to know the maximum or minimum number, .max is your fastest, shortest way there. Great! But what if you need to be more specific? This is where I start to question the usefulness of .max and wonder if there is a better way:



a = %w(albatross dog horse)
a.min(2)                                #=> ["albatross", "dog"] 
a.max(2)                                #=> ["horse", "dog"] 
a.max { |a, b| a.length <=> b.length }  #=> "albatross"


Enter fullscreen mode Exit fullscreen mode

Again, calling .max or .min on an array of strings will return the string(s) with an alphabetical sort. When a number of arguments is given, it will return them in descending order. This would be useful if you just wanted the first or last string alphabetically and didn't care about having or using the whole sorted array. If you needed that, you could just do .sort and then .first/.last etc, without the .max or .min.



c = %w(mouse house cat rat bat)
c.sort!                  #=>["bat", "cat", "house", "mouse", "rat"] 
c.first                  #=> "bat" 
c.last                   #=> "rat" 
c[2]                     #=> "house" 


Enter fullscreen mode Exit fullscreen mode

Special shoutout here to .sort! - it is destructive so it modifies the original array. .sort is non-destructive so it would still sort, but it would be more like creating a separate array and so in the example above, c would still have the same order and methods such as c.first would return mouse, etc.

It's more likely that we would want to return something like the longest word. Now, .max with the |a,b| block is where you might be thinking, this is useful! We can specify .length now! And you are right. Also, if we ever needed to, we could reverse the .max by reversing the order of a and b:



(10..20).max {|a,b| b <=> a}            #=> 10


Enter fullscreen mode Exit fullscreen mode

but this is just a longer way to write .min. Similarly, specifying a.length <=> b.length with .max is just a longer way to write... .max_by !

.max_by



array = ["albatross", "dog", "horse", "fish", "antelope", "zzzzzzzz"]
array.max                                  #=> "zzzzzzzz"
array.max_by { |x| x.length }              #=> "albatross"


Enter fullscreen mode Exit fullscreen mode

This does the exact same work as .max, you just only have to write .length once, so that's a game changer. Basically, remember that with .max you are always comparing two things, a and b, so you have to specify the attribute for both, whereas .max_by includes that comparison within the method and assumes you are comparing the same attribute.

And we're not even done yet! You can make this already short code even shorter with Ruby's Proc class. The & tells Ruby we are using a Proc to "encapsulate" the length attribute:



array.max_by(&:length)                    #=> "albatross"


Enter fullscreen mode Exit fullscreen mode

Final note for potentially unique .max use

The only scenario I have been able to imagine in which you would need .max and not be able to use .max_by is one where you, for some reason, need to compare different attributes.



arr1 = [1,2,0,0]                          #length > sum
arr2 = [4,4,4]                            #sum > length
array = [arr1, arr2]                      #=> [[1, 2, 0, 0], [4, 4, 4]]  
array.max                                 #=> [4, 4, 4] 
array.max {|a,b| a.length <=> b.sum}      #=> [1, 2, 0, 0]  


Enter fullscreen mode Exit fullscreen mode

At first, I thought of this example like asking, "For each array in this array of arrays, which array's length is greater than, equal to, or less than its sum?" and the result that gives us '1' for 'greater than' is returned. But! I don't think it's that simple...



[4, 4, 4].length <=> [1, 2, 0, 0].sum     #=> 0 


Enter fullscreen mode Exit fullscreen mode

Without .max, and using only <=>, the idea generally works. We are comparing two different attributes on two different arrays. But how does this work for sorting?

This example is a bit number heavy but demonstrates this unusual sorting process:



a = [1, 2, 3]               # length 3, sum 6
b = [1, 0, 0]               # length 3, sum 1
c = [1, 1, 1, 1, 1, 1]      # length 6, sum 6
d = [0, 0, 0, 0, 0, 0, 0]   # length 7, sum 0
e = [4, 4]                  # length 2, sum 8
nums = [a,b,c,d,e] 
nums.sort! {|a,b| a.length <=> b.sum}
 #=> [[0, 0, 0, 0, 0, 0, 0], [1, 0, 0], [1, 2, 3], [1, 1, 1, 1, 1, 1], [4, 4]]


Enter fullscreen mode Exit fullscreen mode

This process goes through each array in an array of arrays and asks "Is the length of the array we call 'a' greater than, less than, or equal to the sum of array 'b'?" So 'e' is last because its sum, 8, is greater than any of the other lengths. 'c' and 'a' have equal sums but 'c' has the greater length and so has higher rank. Vice versa for 'b' and 'a' - they have equal lengths and so higher sum gets higher rank. 'd' has the longest length but lowest sum. I can also see this example as simple sorting by sum increasing, and only sorting by length if two sums are equal.

Good news is, it's probably pretty rare that you'd ever need to use .max like this. But, it is an interesting example to gain a better understanding of the <=> behind the magic. Let me know if you have a better explanation for this one!

💖 💪 🙅 🚩
sylviapap
Sylvia Pap

Posted on February 12, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related