Process a large csv file with parallel processing #eg39

esproc_spl

Judy

Posted on September 11, 2024

Process a large csv file with parallel processing #eg39

A csv file stores a large amount orders data.

Image description

Use Java to process this file: Find orders whose amounts are between 3,000 and 5,000, group them by customers, and sum order amounts and count orders.

Image description
Write the following SPL statement:

=file("d:/OrdersBig.csv").cursor@mtc(;8).select(Amount>=3000 && Amount<5000).groups(Client;sum(Amount):amt,count(1):cnt)

cursor() function parses a large file that cannot fit into the memory; by default, it performs the serial computation. @m option enables multithreaded data retrieval; 8 is the number of parallel threads; @t option enables importing the first line as column titles; and @c option enables using comma as the separator.

Read How to Call a SPL Script in Java to find how to integrate SPL into a Java application.

This is one of the problems on StackOverflow. You can click on it to see that the conventional solution is quite complicated, but the SPL approach is really simple and efficient.

SPL open source address

💖 💪 🙅 🚩
esproc_spl
Judy

Posted on September 11, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related