Under the Hood of Ruby Array #Sort

Josiah Fordahl
7 min readApr 16, 2021

The following article will attempt to thoroughly and precisely explain how the following code snippets work:

["Aa", "AAA", "b", "aa", "Aaa"].sort[100, 77, 200, 7, 21].sort

When sorting Strings, Array#sort uses the return value of the combined comparison operator (also known as the spaceship operator) String#<=> and the ASCII (hexadecimal) value of the character being sorted.

When sorting Integers, Array#sort uses the return value of the combined comparison operator Integer#<=> and compares each integer to each-other sorting them in ascending order based on the integer’s value.

ASCII Value Table:

Above we are looking at the red Char column, and the Dec (hexadecimal value) column. On the second row upper right quadrant we can see the value given to uppercase “A” and the value given to lower case “a”.

According to the ASCII table, “A” has a hexadecimal value of 65, while “a” has a hexadecimal value of 97. This means if compare “a” <=> “A” , “A” will come before “a” as <=> uses ascending hexadecimal values.

The Combined Comparison Operator:

1.)     "A" <=> "a"   #=> -1  2.)     "A" <=> "A"   #=>  0  3.)     "a" <=> "A"   #=>  14.)     "a" <=> 1     #=> nil 
  1. When the left-side of the comparison is less than the right-side the combined comparison operator returns -1. This comparison can be thought of as comparing hexadecimal values, 65 <=> 97 . Here we see that the left side is less than the right side which returns -1.
    — In the #sort context, when the combined comparison operator returns -1, the left hand value should be before the right hand value.
  2. When the left-side of the comparison is equal to (same as) the right-side the combined comparison operator returns 0. The hexadecimal values being compared here are 65 <=> 65 . Here we see these values are equal which returns 0.
    — In the #sort context, when the combined comparison operator returns 0, the left hand and right hand values stay next to each-other.
  3. When the left-side of the comparison is greater than the right-side the combined comparison operator returns 1. The hexadecimal values being compared here are 97 <=> 65. Here we see the left side is greater than the right side which returns 1.
    — In the #sort context, when the combined comparison operator returns 1, the right hand value swaps position with the left hand value.
  4. nil is returned when two objects cannot be compared. If nil is returned to #sort it will throw an ArgumentError.

How does the return value of the comparison operator help with sorting?

Let’s look at a simple example to understand what is happening:

["b","d","a","c","e"].sort #=> ["a", "b", "c", "d", "e"]

We see the return value of #sort which uses the combined comparison operator<=> and a complex underlying algorithm (quick sort) to compare elements inside the array that #sort is being called on. It rearranged our array of single character strings into ascending alphabetical order.

Let’s dive deeper and find out how it did this and why it works this way:

["b","d","a","c","e"].sortFirst comparison:
"b" <=> "d"
#=> -1 # Starting with "b" is less than "d" - no change.
The array now appears like this:
["b","d","a","c","e"]
------------------------------------------
Second comparison:
"d" <=> "a"
#=> 1 # Here "d" is greater than "a", they swap positions.
The array now appears like this:
["b","a","d","c","e"]
------------------------------------------
Third comparison:
"b" <=> "a"
#=> 1 # Here "b" is greater than "a", they swap positions.
The array now appears like this:
["a","b","d","c","e"]
-----------------------------------------
Fourth comparison:
"d" <=> "c"
#=> 1 # Here "d" is greater than "c", they swap positions.
The array now appears like this:
["a","b","c","d","e"]
------------------------------------------
Fifth comparison:
"b" <=> "c"
#=> -1 # Here "b" is less than "c" - no change.
The array now appears like this:
["a","b","c","d","e"]
------------------------------------------
Sixth comparison:
"d" <=> "e"
#=> -1 # Here "d" is less than "e" - no change.
The array is now returned:
=>["a","b","c","d","e"]

As we can see above, the spaceship operator is doing the actual work of sorting.

Sorting Strings within an array in Ruby:

["Aa", "AAA", "b", "aa", "Aaa"].sort

If we call #sort on this array of string objects, it returns the following:

#=> ["AAA", "Aa", "Aaa", "aa", "b"]

Why?
Let’s break it down like we did above…

The comparison of multi-character strings adds another layer of complexity as you can see below.

One rule to keep in mind, if the two strings are identical up to the length of the shorter string, then the longer string will be considered greater.

["Aa", "AAA", "b", "aa", "Aaa"].sortFirst comparison:
"Aa"<=> "AAA"
#=> 1 # "Aa" is greater than "AAA" these values swap.
-------------
Why is "Aa" greater than "AAA"?
Here, multiple comparisons are happening.
The First character of each string is compared.
"Aa"<=> "AAA"
1. "A" <=> "A" #=> 0 (returns equal)
The Second character of each string is compared
2. "a" <=> "A" #=> 1 (greater than)

As explained above, "a" is greater than "A" based on ASCII table values which are used for the comparison of each character.

At this point we have come to the end of the first string, so character by character comparison ends.
-------------
The array now appears like this:
["AAA", "Aa", "b", "aa", "Aaa"]------------------------------------------
Second comparison:
"Aa"<=>"b"
#=> -1 # "Aa" is less than "b" - no change.
-------------
Why is "Aa" less than "b"?
The First character of each string is compared.
"Aa"<=> "b"
1. "A" <=> "b" #=> -1 (less than)At this point we have come to the end of the second string, so character by character comparison ends.-------------The array now appears like this: ["AAA", "Aa", "b", "aa", "Aaa"]------------------------------------------
Third comparison:
"b"<=>"aa"
#=> 1 # "b" is greater than "aa" - these values swap.
-------------
Why is "b" greater than "aa"?
The First character of each string is compared.
"b"<=> "aa"
1. "b" <=> "a" #=> 1 (greater than)At this point we have come to the end of the first string, so character by character comparison ends.-------------The array now appears like this:
["AAA", "Aa","aa","b", "Aaa"]
------------------------------------------
Fourth comparison:
"Aa"<=>"aa"
#=> -1 # "Aa" is less than "aa" - no change.
-------------
Why is "Aa" greater than "aa"?
The First character of each string is compared.
"A"<=> "a"
1. "A" <=> "a" #=> -1 (less than)This ends the comparison, no more evaluation is needed so character by character comparison ends after the first characters. -------------The array now appears like this:
["AAA", "Aa","aa","b", "Aaa"]
------------------------------------------
Fifth comparison:
"b"<=>"Aaa"
#=> 1 # "b" is greater than "Aaa" - these values swap.
-------------
Why is "b" greater than "Aaa"?
The First character of each string is compared.
"b"<=> "Aaa"
1. "b" <=> "A" #=> -1 (greater than)At this point we have come to the end of the first string, so character by character comparison ends.-------------The array now appears like this:
["AAA", "Aa","aa","Aaa","b"]
------------------------------------------
Sixth comparison:
"aa"<=>"Aaa"
#=> 1 # "aa" is greater than "Aaa" - these values swap.
-------------
Why is "aa" greater than "Aaa"?
The First character of each string is compared.
"aa"<=> "Aaa"
1. "a" <=> "A" #=> 1 (greater than)This ends the comparison, no more evaluation is needed so character by character comparison ends after the first character.-------------The array now appears like this:
["AAA", "Aa","Aaa","aa","b"]
------------------------------------------
Seventh comparison:
"Aa"<=>"Aaa"
#=> -1 # "Aa" is less than "Aaa" - no change.
-------------
Why is "Aa" less than "Aaa"?
The First character of each string is compared.
"A"<=> "Aaa"
1. "A" <=> "A" #=> 0 (equal to)Second character Compared:
2. "a" <=> "a" #=> 0 (equal to)
At this point we have come to the end of the first string, so character by character comparison ends. As noted above, if the two strings are identical up to the length of the shorter string, then the longer string will be considered greater.-------------The array is now returned:
=> ["AAA", "Aa", "Aaa", "aa", "b"]

Sorting integers within an array in Ruby:

[100, 77, 200, 7, 21].sort

If we call #sort on this array of integer objects, it returns the following:

[7, 21, 77, 100, 200]

Why?

[100, 77, 200, 7, 21].sortFirst comparison:
100 <=> 77
#=> 1 # 100 is greater than 77 - these values swap.
The array now appears like this:[77,100, 200, 7, 21]------------------------------------------
Second comparison:
100 <=> 200
#=> -1 # 100 is less than 200 - no change.
The array now appears like this:[77,100, 200, 7, 21]------------------------------------------
Third comparison:
200 <=> 7
#=> 1 # 200 is greater than 7 - these values swap.
The array now appears like this:
[77,100, 7, 200, 21]
------------------------------------------
Fourth comparison:
100 <=> 7
#=> 1 # 100 is greater than 7 - these values swap.
The array now appears like this:
[77, 7, 100, 200, 21]
------------------------------------------
Fifth comparison:
77 <=> 7
#=> 1 # 77 is greater than 7 - these values swap.
The array now appears like this:
[7, 77, 100, 200, 21]
------------------------------------------
Sixth comparison:
200 <=> 21
#=> 1 # 200 is greater than 21 - these values swap.
The array now appears like this:
[7, 77, 100, 21, 200]
------------------------------------------
Seventh comparison:
100 <=> 21
#=> 1 # 100 is greater than 21 - these values swap.
The array now appears like this:
[7, 77, 21, 100, 200]
------------------------------------------
Eighth comparison:
77 <=> 21
#=> 1 # 77 is greater than 21 - these values swap.
The array now appears like this:
[7, 21, 77, 100, 200]
------------------------------------------
Ninth comparison:
7 <=> 21
#=> -1 # 7 is less than 21 - no change.
The array is now returned:
=> [7, 21, 77, 100, 200]

How to figure out how Ruby is sorting your array:

To figure out how Ruby is sorting any array, two separate block arguments are needed. Hop into IRB and use the following:

  1. To see which values are being evaluated:
irb(main):002:0> [100, 77, 200, 7, 21].sort { |x,y| p [x,y]; x<=>y }[100, 77][100, 200][200, 7][100, 7][77, 7][200, 21][100, 21][77, 21][7, 21]=> [7, 21, 77, 100, 200]

2. To see what these values are evaluating to:

irb(main):003:0> [100, 77, 200, 7, 21].sort { |x,y| p x<=>y }1-1111111-1=> [7, 21, 77, 100, 200]

--

--