resurfacelabs
Posted on June 15, 2021
Too many data frameworks built for large scale have unacceptable complexity at small scale. But with a few tweaks, Trino scales down to run nicely on small single-container configurations.
(Trino is the new brand for PrestoSQL, an open source distributed query engine.)
Official docker image is large
The docker image provided by the Trino team (trinodb/trino
) is 1.32 GB when extracted. This includes a full CentOS distribution, which is a safe and comfortable choice. But this is pretty large for cases where Trino is embedded into another application, like we're doing with Resurface.
Picking a smaller base image
Much of the weight from the official Trino container is from the base CentOS image.
FROM azul/zulu-openjdk-centos:11
Switching to an Alpine-based distribution like adoptopenjdk cuts the download size dramatically.
FROM adoptopenjdk/openjdk11:jdk-11.0.10_9-alpine-slim
⚠️Pick your Alpine distribution carefully! We've seen significant performance degradations for Java applications when using Alpine distributions that don't include glibc. The adoptopenjdk containers have good performance while still being relatively small.
Reducing the number of connectors
The next step is optional, but has a big impact on container size. Trino ships with many pre-installed connectors, each of which requires supporting libraries.
However, these connectors aren't all strictly required. For our single-container distributions, we strip out all the optional connectors except for our own Resurface connector.
rm -rf /opt/trino/plugin/accumulo &&\
rm -rf /opt/trino/plugin/atop &&\
rm -rf /opt/trino/plugin/bigquery &&\
rm -rf /opt/trino/plugin/blackhole &&\
rm -rf /opt/trino/plugin/cassandra &&\
rm -rf /opt/trino/plugin/clickhouse &&\
rm -rf /opt/trino/plugin/druid &&\
rm -rf /opt/trino/plugin/elasticsearch &&\
rm -rf /opt/trino/plugin/example-http &&\
rm -rf /opt/trino/plugin/geospatial &&\
rm -rf /opt/trino/plugin/google-sheets &&\
rm -rf /opt/trino/plugin/hive-hadoop2 &&\
rm -rf /opt/trino/plugin/iceberg &&\
rm -rf /opt/trino/plugin/jmx &&\
rm -rf /opt/trino/plugin/kafka &&\
rm -rf /opt/trino/plugin/kinesis &&\
rm -rf /opt/trino/plugin/kudu &&\
rm -rf /opt/trino/plugin/local-file &&\
rm -rf /opt/trino/plugin/memsql &&\
rm -rf /opt/trino/plugin/ml &&\
rm -rf /opt/trino/plugin/mongodb &&\
rm -rf /opt/trino/plugin/mysql &&\
rm -rf /opt/trino/plugin/oracle &&\
rm -rf /opt/trino/plugin/phoenix &&\
rm -rf /opt/trino/plugin/phoenix5 &&\
rm -rf /opt/trino/plugin/pinot &&\
rm -rf /opt/trino/plugin/postgresql &&\
rm -rf /opt/trino/plugin/prometheus &&\
rm -rf /opt/trino/plugin/raptor-legacy &&\
rm -rf /opt/trino/plugin/redis &&\
rm -rf /opt/trino/plugin/redshift &&\
rm -rf /opt/trino/plugin/sqlserver &&\
rm -rf /opt/trino/plugin/teradata-functions &&\
rm -rf /opt/trino/plugin/thrift &&\
rm -rf /opt/trino/plugin/tpcds &&\
rm -rf /opt/trino/plugin/tpch
Tuning memory parameters
Trino is very tunable when it comes to memory usage. But beyond that, the Trino team doesn't discourage small configurations. When I had the chance to ask Martin Traverso about this, his reaction was that they expect Trino to pass all tests when running on a small laptop-sized configuration, just the same as on a large configuration. The fact that Martin reacted this way gave us renewed confidence to experiment with smaller configurations.
For our smallest containers, we limit Trino to 1GB of memory using these standard parameters.
query.max-length=1000000
query.max-memory=1000MB
query.max-memory-per-node=1000MB
query.max-total-memory=1000MB
query.max-total-memory-per-node=1000MB
If you're still seeing out-of-memory conditions, you may also want to reduce the memory used by the query cache. This is especially important if your SQL statements are large, or if your transaction rates are relatively high so that a lot of query history data is being cached.
query.max-history=20
query.min-expire-age=1s
Final results
Following these steps yields a stable and high-performing Trino configuration that is 391 MB. That's just 30% of the download size of the standard Trino container! This doesn't come without tradeoffs, but is great to have this range in flexibility.
If you're looking for a minimal Trino container image, you can use ours as a base. (The version tag corresponds to the Trino version)
FROM resurfaceio/trino-minimal:358
Or you can inspect this Dockerfile for ideas on how to build your own lightweight Trino image.
https://github.com/resurfaceio/containers/blob/master/trino/trino-minimal.dockerfile
Posted on June 15, 2021
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.