Setting up memory for Flink - Configuration
kination
Posted on November 22, 2024
Flink offers various ways to setup memory, which is needed to run application efficiently on top of JVM. You can just offer outline and let application use it in proper way, or can assign in detail for each features.
What is "MemorySize"
Before that, we need to see the rule for configuration.
Here's the list to define memory settings.
Now we can think, what is the 'type' of variable this configs can accept?
In this chart, type has been defined as 'MemorySize', which is not a number. And API docs guide as
- 1b or 1bytes (bytes)
- 1k or 1kb or 1kibibytes (interpreted as kibibytes = 1024 bytes)
- 1m or 1mb or 1mebibytes (interpreted as mebibytes = 1024 kibibytes)
- 1g or 1gb or 1gibibytes (interpreted as gibibytes = 1024 mebibytes)
- 1t or 1tb or 1tebibytes (interpreted as tebibytes = 1024 gibibytes)
If you go more in code,
You can see that
- Split
number
andstring
(maybe you've found this logic in coding test...) - For
string
part, force switch to lower case - Check
number
doesn't have problem in format - Find appropriate 'unit' for
string
, and get multiplier with this - Calculate number and multiplier, and check whether result cause overflow
So,
- Use appropriate number and unit
- 'case' of unit doesn't matter
How memory effect on performance
Here's a breakdown of the key memory types in Flink:
and here's the detail for each features.
JVM Heap Memory
This is the standard heap memory managed by the Java Virtual Machine (JVM). It is used for objects that are created during the execution of a Flink application.
Flink uses JVM heap memory for various runtime operations, including task execution, state management, and buffering. You can configure the heap memory size using the job(task)manager.memory.task.heap.size
parameter in flink-conf.yaml
.
Properly sizing the JVM heap is crucial to avoid frequent garbage collection pauses, which can impact performance.
Off-Heap Memory
Off-heap memory is memory allocated outside of the JVM heap. It is managed manually by the application or through native libraries.
Flink uses off-heap memory for certain operations to reduce garbage collection overhead and improve performance. This is particularly useful for large state backends and network buffers.
Off-heap memory can be configured using the job(task)manager.memory.task.off-heap.size
parameter.
While off-heap memory can reduce garbage collection pressure, it requires careful management to avoid out-of-memory errors.
Additional config for Taskmanager
The settings mentioned above are commonly used for both JobManager and TaskManager. In addition, TaskManager offers extra configuration options that allow for more fine-tuned application performance.
Of course, this is not a mandatory setting. If no value is specified, the application will automatically assign an appropriate value at range of off-heap memory.
Network Memory
Network memory is used for buffering data during network communication between TaskManagers.
It is crucial for efficient data exchange in distributed processing, especially for shuffling operations and data streaming between tasks.
Network memory is configured using the taskmanager.memory.network.fraction
, taskmanager.memory.network.min
, and taskmanager.memory.network.max
parameters.
- Considerations: Adequate network memory is essential for high-throughput applications. Insufficient network memory can lead to backpressure and degraded performance.
Managed Memory
Managed memory is a portion of memory that Flink manages for specific operations like sorting, joining, and caching state.
It is used by Flink's internal algorithms and state backends, such as RocksDB, to efficiently manage large datasets.
Managed memory can be configured using the taskmanager.memory.managed.size
or taskmanager.memory.managed.fraction
parameters.
Properly configuring managed memory is important for operations that require significant memory, such as large joins or aggregations. It helps in optimizing performance by reducing disk I/O.
Reference
https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/deployment/memory/mem_setup/
Posted on November 22, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.