Kotlin DataFrame ❤️ Arrow
Florian Bernard
Posted on October 10, 2024
Kotlin DataFrame v0.14 comes with improvements for reading Apache Arrow format, especially loading a DataFrame from any ArrowReader.
This improvement can be used to easily load results from analytical databases (such as DuckDB, ClickHouse) directly into Kotlin DataFrame.
Here are two examples of integrations that allow for smooth data import into Kotlin DataFrames using Apache Arrow.
DuckDB
DuckDB is an Analytics database that can be embedded for use in a Kotlin notebook. DuckDB facilitates reading query results as an Arrow Stream, enabling straightforward loading into a Kotlin DataFrame.
Here is a basic notebook which uses DuckDB for querying data from a remote parquet file and then importing the results into a Kotlin dataFrame using Arrow Stream
ClickHouse
ClickHouse is a high-performance, column-oriented SQL database management system (DBMS) designed for online analytical processing (OLAP).
ClickHouse allows using Arrow Stream as an output format.
The next notebook uses the ClickHouse client for querying data in Arrow stream format and loads it in a kotlin dataFrame.
Conclusion
Loading Arrow data into Kotlin DataFrame is both straightforward and efficient, harnessing the full power of Kotlin for data analysis. This integration not only simplifies the process but also enhances performance, making Kotlin DataFrame a powerful tool for handling and analyzing large datasets.
Posted on October 10, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.