Performing basic stats in Java 8

habeebcycle

Habeeb Okunade

Posted on February 22, 2020

Performing basic stats in Java 8

The Stream interface is introduced in Java 8 which supports parallel execution. The Stream interface supports the sorted, map, filter, reduce pattern and executes easily, forming the basis (along with lambdas) for functional-style programming. There are also corresponding primitive streams (IntStream, DoubleStream, and LongStream) for performance reasons.

In this write-up, we will be looking at how to perform basic statistics like maximum, minimum, mean (average) and the number of occurrences for data presentation purposes with stream API and Collections. Our approach will be to create values using one of the classes in the stream API (java.util.stream) and Collections and statistics utilities provided in the package java.util called DoubleSummaryStatistics (You can also explore IntSummaryStatistics and LongSummaryStatistics to perform basic statistics with them).

Let's assume we have a set of data, we will use of of DoubleStream (stream API) to collate, sort or filter our data, and use summaryStatistics method to get the maximum, minimum, average, count and the sum of our data.

We will use the frequency method of Collections to get the number of occurrence for value in the dataset which can be used to form the basis of presentation as a histogram.

Let's start coding to see this in action.

package com.habeebcycle;

import java.util.stream.*;
import java.util.*;

public class BasicStats{    
    public static void main (String[] args) {

                //Dataset as follows
        double dataset[] = {32, 23, 54, 15.2, 26.3, 7.1, 18.7, 14.2, 23,
                 25, 21.7, 12.4, 21, 24, 42, 55, 23, 14.5, 21.3, 26.3, 53, 23,
                 15.2, 7.1, 15.4, 23, 15.2, 14.2, 14.2, 25, 18.7, 15.2, 14.5};

        //Get the basic statistics
        DoubleSummaryStatistics stats = 
                    DoubleStream.of(dataset).summaryStatistics();

        // Now stats variable has our basic stats

        //Let's get the total values in our dataset
        System.out.println (stats.getCount()); //Gives 33;

        //Let's get the sum
        System.out.println (stats.getSum()); //Gives 753.4

        //Let's get the mean (average)
        System.out.println (stats.getAverage()); //Gives 22.83

        //Let's get the maximum
        System.out.println (stats.getMax()); //Gives 55.0

        //Let's get the maximum
        System.out.println (stats.getMin()); //Gives 7.10
    }
}

We can combine statistical results to operates on them at once by using

   stats.combine(DoubleSummaryStatistics otherStats);

To get our dataset as a presentation approach, we are going to use boxed() method of the stream, so that the collect method could work with Collectors class.

Collectors class has a method called groupingBy that can be used to group our dataset as a distinct set by counting the number of occurrence of each value in the dataset.

Map<Double, Long> histogram =
 DoubleStream.of(dataset)
  .boxed()
  .collect(Collectors.groupingBy(
       e -> e, 
       Collectors.counting()
   ));

histogram now has each data with the number of occurrence as a key-value map. We can use our custom method to print out the histogram using stars.

for(Double data : histogram.keySet()){
   System.out.println (
     data + " : " + histogram.get(data) + " : " + getStars(histogram.get(data))
   );
}

Where getStars method is defined as follows:

public static String getStars(long number){
   String output = "";
   for(int i = 1; i <= number; i++){
      output += " * ";
   }
   return output;
}

The output present our data as a histogram

32.0 : 1 :  * 
18.7 : 2 :  *  * 
15.4 : 1 :  * 
42.0 : 1 :  * 
12.4 : 1 :  * 
21.7 : 1 :  * 
15.2 : 4 :  *  *  *  * 
53.0 : 1 :  * 
14.2 : 3 :  *  *  * 
55.0 : 1 :  * 
54.0 : 1 :  * 
14.5 : 2 :  *  * 
21.0 : 1 :  * 
26.3 : 2 :  *  * 
23.0 : 5 :  *  *  *  *  * 
21.3 : 1 :  * 
24.0 : 1 :  * 
25.0 : 2 :  *  * 
7.1  : 2 :  *  *

The Stream API combined with Collections utilities are powerful with a simple set of tools for processing dataset. It allows us to reduce a huge amount of boilerplate code, create more readable programs and improve app’s productivity when used properly.

💖 💪 🙅 🚩
habeebcycle
Habeeb Okunade

Posted on February 22, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Performing basic stats in Java 8
java Performing basic stats in Java 8

February 22, 2020