Counting Sort algorithm

yevheniipazii

Yevhenii Pazii

Posted on April 1, 2023

Counting Sort algorithm

This article explains how to implement and use the Counting Sort algorithm. Since I am a Java developer, the examples would be using Java, but it is applicable to any other programing language. If you want to have a summary, just scroll to a summary section.

Suppose there is an array given [1, 3, 1, 5] which needs to be sorted in ascending order. Pretty simple isn't it?

Counting Sort

Here is a pseudocode from wiki that demonstrates provides a basic algorithm:

function CountingSort(input, hi)
    count ← array of hi + 1 zeros //1 

    for i = 0 to length(input) - 1 do //2
        j = key(input[i])
        count[j] = count[j] + 1

    for i = 1 to hi do //3
        count[i] = count[i] + count[i - 1]

    output ← array of same length as input //4

    for i = length(input) - 1 down to 0 do //5
        j = key(input[i])
        count[j] = count[j] - 1
        output[count[j]] = input[i]

    return output
Enter fullscreen mode Exit fullscreen mode

Don't worry there will be an explanation for each individual step, but first, let's state some terms:
input - is a given array.
output - is a resulting sorted array.
count- is an array that would be used for counting.
hi - is the biggest element in the input.

First part. Create a count array

Create the count array filled with 0 (zeros) of hi + 1 length. It will store the count of each element placed at the number as an index. For instance, 3 from the input array, the count of it's occurrences will be stored at the count[3] cell.
Preparation of the count array
Now, it is obvious why we need to know the hi element from the input array and why an extra cell is needed for 5th index.

int[] countingSort(int[] input, int hi) {
    var count = new int[hi + 1]; //1

    //Rest of the code 
}
Enter fullscreen mode Exit fullscreen mode

Java by default fills an array with zeros, but for other languages, there might be some handling to prepare the count array.

Second part. Count number of elements

Next, using the count array, all elements need to be counted by looping over the input array, it will help us with the following steps. With [1, 3, 1, 5] as the input array, the result would be [0, 2, 0, 1, 0, 1].

Count elements of input

Here is an example with Java:

int[] countingSort(int[] input, int hi) {
    var count = new int[hi + 1]; //1

    for (var n : input) { //2
        count[n]++;
    }

    //Rest of code 
}
Enter fullscreen mode Exit fullscreen mode

Third Part. Calculate indexes

This time loop over the count array to calculate indexes, where count[i] = count[i] + count[i - 1], except for the 0 indexed element, which stays unchanged. By doing this we would know where each element needs to be placed (see the following steps). For instance, for the input array [1, 3, 1, 5] with the counts [0, 2, 0, 1, 0, 5] the indexes array would be [0, 2, 2, 3, 3, 4].

Calculating indexes

int[] countingSort(int[] input, int hi) {
    var count = new int[hi + 1]; //1

    for (var n : input) { //2
        count[n]++;
    }

    for (var i = 1; i < count.length; i++) { //3
        count[i] += count[i - 1];
    }

    //Rest of code
}
Enter fullscreen mode Exit fullscreen mode

Fourth part. Create output array

Create the output array, with the same length as original one. Nothing special here, but it will be important later on.

int[] countingSort(int[] input, int hi) {
    var count = new int[hi + 1]; //1

    for (var n : input) { //2
        count[n]++;
    }

    for (var i = 1; i < count.length; i++) { //3
        count[i] += count[i - 1];
    }

    var output = new int[input.length]; //4

    //Rest of code

    return output;
}
Enter fullscreen mode Exit fullscreen mode

Fifth part. Fill output array

The [0, 2, 2, 3, 3, 4] indexes array from the previous steps means the following: each element can be expressed as a range with the element itself as an exclusive end and the previous one as an inclusive beginning (zero-indexed element takes 0 as inclusive lo range):

  • Element 0 is in a range [0, 0), zero repetitions
  • Element 1 is in a range [0, 2), two repetitions
  • Element 2 is in a range [2, 2), zero repetitions
  • Element 3 is in a range [2, 3), one repetition
  • Element 4 is in a range [3, 3), zero repetitions
  • Element 5 is in a range [3, 4), one repetition

At this point everything to form the output array is available. Loop over the input array inserting the elements from it into the output array.

Image description

int[] countingSort(int[] input, int hi) {
    var count = new int[hi + 1]; //1

    for (var n : input) { //2
        count[n]++;
    }

    for (var i = 1; i < count.length; i++) { //3
        count[i] += count[i - 1];
    }

    var output = new int[input.length]; //4

    for (var i = input.length - 1; i >= 0; i--) { //5
        output[--count[input[i]]] = input[i];
    }

    return output;
}
Enter fullscreen mode Exit fullscreen mode

This might be a little tricky, but nothing too special. At first, input[i] returns the current element in the input array. count[input[i]] returns an exclusive end range for the element. For instance, for the number 5 it would be 4. Then -- decrease it to 3 -> --count[input[i]] with is inclusive now. Finally, set the element to a position of the output array output[--count[input[i]]] = input[i]. Looping backward helps to preserve it stable. There will be a paragraph about this term.

Possible improvements

OK, it is a good enough implementation, but there are a couple of improvements that could be made.

Check input

What if the input is null or empty, or contains only one element? In such a case, there is nothing we could or should do.

int[] countingSortImproved(int[] input) {
    if (input == null || input.length < 2) return input;
    //The rest of code
}
Enter fullscreen mode Exit fullscreen mode

Unknown max element

What if we don't know the hi element in the array? Then the length of the count array is unknown as well. Let's find it on our own.

int[] countingSortImproved(int[] input) {
    if (input == null || input.length < 2) return input;

    //safe to do, since we know that the length is at least 2
    var hi = input[0]; 

    for (var i = 1; i < input.length; i++) {
        hi = Math.max(hi, input[i]);
    }

    var count = new int[hi + 1]; //1
    //The rest of code
}
Enter fullscreen mode Exit fullscreen mode

Already sorted input

What if the input array is already sorted? It would be good to check it first to avoid a lot of manipulations.

int[] countingSortImproved(int[] input) {
    if (input == null || input.length < 2) return input;

    var hi = input[0]; 
    var sorted = true;
    for (var i = 1; i < input.length; i++) {
        hi = Math.max(hi, input[i]);
        sorted &= input[i - 1] <= input[i];
    }
    if (sorted) return input;

    var count = new int[hi  + 1]; //1
    //The rest of code
}
Enter fullscreen mode Exit fullscreen mode

Together with finding hi a check can be performed to examine the current element if it is bigger or equal to the previous. &= accumulates all measurements to sorted. Only if all checks returned true then the input could be returned back as a result.

Big and/or negative numbers

So far so good, but what if there are negative numbers [-3, 2, 1]? Or big numbers [1030-1, 1030-2, 1030]? The algorithm will fail for the first case, as there is no -3 index in the count array. In the second case, it will work but the count array would be 1030 long.
To solve both issues a shift can be introduced. Let's take our lo element and agree that it will be our 0 index.
The first case: -3 -> 0; 2 -> 5; 1 -> 4.
The second case: 1030-2 -> 0; 1030-1 -> 1; 1030-> 2

The updated code would look like this:

int[] countingSortImproved(int[] input) {
    if (input == null || input.length < 2) return input;

    var hi = input[0];
    var lo = input[0];
    var sorted = true;
    for (var i = 1; i < input.length; i++) {
        hi = Math.max(hi, input[i]);
        lo = Math.min(lo, input[i]);
        sorted &= input[i - 1] <= input[i];
    }
    if (sorted) return input;

    //now the count would fit in the [lo, hi] range  
    var count = new int[hi - lo + 1]; //1

    for (var n : input) { //2
        count[n - lo]++; //transform the index, -3 or 10^30-2 to 0;
    }

    for (var i = 1; i < count.length; i++) { //3
        count[i] += count[i - 1];
    }

    var output = new int[input.length]; //4

    for (var i = input.length - 1; i >= 0; i--) { //5
        //count index now input[i] - lo
        output[--count[input[i] - lo]] = input[i];
    }

    return output;
}
Enter fullscreen mode Exit fullscreen mode

The updated steps are 1, 3, and 5.

Combine step 4 and 5

OK, let's consider the following input and count arrays: [3, 2, 3, 5] and [1, 2, 0, 1], lo = 2. It is possible to loop over the count array and add as many numbers as count for it's number. So even the original array is not needed anymore.

  1. 2 one time (index 0)
  2. 3 two times, (index 1)
  3. 4 zero times, (index 2)
  4. 5 one time, (index 3)

[2, 3, 3, 5] as a result. But be aware that this change makes the algorithm not stable. It is not important for the ints, but if some Objects are being sorted and it is critical to have original ones, this improvement will not work.

int[] countingSortImprovedAndCombined(int[] input) {
    if (input == null || input.length < 2) return input;

    var hi = input[0];
    var lo = input[0];
    var sotred = true;
    for (var i = 1; i < input.length; i++) {
        hi = Math.max(hi, input[i]);
        lo = Math.min(lo, input[i]);
        sorted &= input[i - 1] <= input[i];
    }
    if (sorted) return input;

    var count = new int[hi - lo + 1]; //1 

    for (var n : input) { //2
        count[n - lo]++;
    }

    var output = new int[input.length]; //4

    var index = 0;
    for (var i = 0; i < count.length; i++) { // 3 & 5 combined
        for (var j = 0; j < count[i]; j++) {
            output[index++] = i + lo;
        }
    }

    return output;
}
Enter fullscreen mode Exit fullscreen mode

Problems can be solved

Now let's consider some problems other than initial ascending order sorting with the counting algorithm.

Sort descending order

To sort in descending order, all that needs to be done is alter step 3 calculating indexes.

Original:

        for (var i = 1; i < count.length; i++) { //3
            count[i] += count[i - 1];
        }
Enter fullscreen mode Exit fullscreen mode

Changed:

        for (var i = count.length - 2; i >= 0 ; i--) { //3
            count[i] += count[i + 1];
        }
Enter fullscreen mode Exit fullscreen mode

Just counting indexes from the end of the count array.

Finding max occurrence

The main point here is to find the number with max occurrence. In such a case all that we need is the count array, making the algorithm even simpler.

    int countingSortImprovedMaxOccurrence(int[] input) {
        if (input.length < 2) return input[0];

        var hi = input[0];
        var lo = input[0];
        for (var i = 1; i < input.length; i++) {
            hi = Math.max(hi, input[i]);
            lo = Math.min(lo, input[i]);
        }

        var count = new int[hi - lo + 1]; //1

        var maxOccurrence = input[0];
        for (var n : input) { //2
            if (++count[n - lo] > count[maxOccurrence - lo]) {
                maxOccurrence = n;
            }
        }

        return maxOccurrence;
    }
Enter fullscreen mode Exit fullscreen mode

The first steps are the same and the 3rd, 4th, and 5th are not needed anymore. The small trick here is to find the maximum occurrence element during calculating counts and avoid an additional loop.

Removing duplicates

The task is simple, together with sorting required removing duplicates. The count array can be modified to store boolean as the count for an element always would be 1 if it appears at least once. In addition, the elements that appear more than once needs to be counted so that the output array is created with the appropriate size.

    int[] countingSortImprovedDistinct(int[] input) {
        if (input == null || input.length < 2) return input;

        var hi = input[0];
        var lo = hi;
        var sorted = true;
        for (var i = 1; i < input.length; i++) {
            hi = Math.max(hi, input[i]);
            lo = Math.min(lo, input[i]);
            sorted &= input[i - 1] <= input[i];
        }
        if (sorted) return input;

        var count = new boolean[hi - lo + 1]; //1
        var removed = 0;
        for (var n : input) { //2
            if (count[n - lo]) {
                removed++;
            } else {
                count[n - lo] = true;
            }
        }

        var output = new int[input.length - removed]; //4
        var outputIndex = 0;
        for (var i = 0; i < count.length; i++) { //5
            if (count[i]) {
                output[outputIndex++] = i + lo;  
            }
        }

        return output;
    }
Enter fullscreen mode Exit fullscreen mode

There are many other problems that can be solved with an adaptation of or modification of the algorithm.

Complexity and Characteristics

One of the biggest limitations is distribution of numbers. We have solved couple of issues related to it but in the following case using the counting sort is inefficient. [2147483647, -2147483648] (Java's max and min numbers), although it is obvious that sorting is pretty simple but the count array would be extremally huge and even will not fit into int range. So that previous implementation would work only if the hi - lo is in [0, 2147483647) range.

The algorithm is stable unless steps 4 and 5 are combined. Stable in the context of the sorting algorithms means that the elements that originally were in the correct position will not be moved from that place during sorting. This could be important when mowing is an expensive operation. For instance, if books on a bookshelf are being sorted, it would be great not to move books already in their place.

It is not in-place by default, but if modified to combine steps 4 and 5 it will be in-place. In-place means that the original array will be modified to supply the result. Since for the result a new array is being created, it is not fit. If take the bookshelf analogy, a new bookshelf would be built and filled with books from the old one.

The Space Complexity of the algorithm is O(n+k), where k is the size of the count array and n is the size of the original array. The implementation requires having the count array and the result array making the major space waisted on. If steps 4 and 5 are combined, then O(k), since no more need for the result array.

The Time Complexity is again O(n+k). First, there is a loop over the count array to create it and set 0 to each element - k operations. Then loop over the input array to count elements - n operations. Next loop over the count array, to calculate indexes - k operations. n operations to create the result array. And finally, loop over the result array to fill it with sorted values - n operations. As a result there are k + n + k + n + n = 3n + 2k and simplified to O(n+k).

Summary

The Counting Sort is one of the best sorting algorithms with O(n+k) time complexity and O(n+k) space complexity. It consists of steps: create count array, count elements in the input, loop over count array to calculate indexes count[i] += count[i-1], look over input array in backward to fill output array decreasing index.
It is stable, but not in-place. Could be modified to be in-place, but not stable. Inefficient if hi - lo is significantly bigger than number of elements.

Links

https://en.wikipedia.org/wiki/Counting_sort
https://www.bigocheatsheet.com/
https://en.wikipedia.org/wiki/Category:Stable_sorts
https://en.wikipedia.org/wiki/In-place_algorithm
https://www.amazon.com/Algorithms-Java-Parts-1-4-Pts-1-4/dp/0201361205
https://www.amazon.com/Introduction-Algorithms-fourth-Thomas-Cormen/dp/026204630X/ref=sr_1_1

💖 💪 🙅 🚩
yevheniipazii
Yevhenii Pazii

Posted on April 1, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Counting Sort algorithm
algorithms Counting Sort algorithm

April 1, 2023