Setting up memory for Flink - Configuration

kination

kination

Posted on November 22, 2024

Setting up memory for Flink - Configuration

Flink offers various ways to setup memory, which is needed to run application efficiently on top of JVM. You can just offer outline and let application use it in proper way, or can assign in detail for each features.

What is "MemorySize"

Before that, we need to see the rule for configuration.
Here's the list to define memory settings.

https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/deployment/config/#memory-configuration

Now we can think, what is the 'type' of variable this configs can accept?

In this chart, type has been defined as 'MemorySize', which is not a number. And API docs guide as

- 1b or 1bytes (bytes)
- 1k or 1kb or 1kibibytes (interpreted as kibibytes = 1024 bytes)
- 1m or 1mb or 1mebibytes (interpreted as mebibytes = 1024 kibibytes)
- 1g or 1gb or 1gibibytes (interpreted as gibibytes = 1024 mebibytes)
- 1t or 1tb or 1tebibytes (interpreted as tebibytes = 1024 gibibytes)
Enter fullscreen mode Exit fullscreen mode

If you go more in code,

https://github.com/apache/flink/blob/release-1.19/flink-core/src/main/java/org/apache/flink/configuration/MemorySize.java#L272-L302

You can see that

  1. Split number and string (maybe you've found this logic in coding test...)
  2. For string part, force switch to lower case
  3. Check number doesn't have problem in format
  4. Find appropriate 'unit' for string, and get multiplier with this
  5. Calculate number and multiplier, and check whether result cause overflow

So,

  • Use appropriate number and unit
  • 'case' of unit doesn't matter

How memory effect on performance

Here's a breakdown of the key memory types in Flink:

Image description

and here's the detail for each features.

JVM Heap Memory

This is the standard heap memory managed by the Java Virtual Machine (JVM). It is used for objects that are created during the execution of a Flink application.

Flink uses JVM heap memory for various runtime operations, including task execution, state management, and buffering. You can configure the heap memory size using the job(task)manager.memory.task.heap.size parameter in flink-conf.yaml.

Properly sizing the JVM heap is crucial to avoid frequent garbage collection pauses, which can impact performance.

Off-Heap Memory

Off-heap memory is memory allocated outside of the JVM heap. It is managed manually by the application or through native libraries.
Flink uses off-heap memory for certain operations to reduce garbage collection overhead and improve performance. This is particularly useful for large state backends and network buffers.
Off-heap memory can be configured using the job(task)manager.memory.task.off-heap.size parameter.
While off-heap memory can reduce garbage collection pressure, it requires careful management to avoid out-of-memory errors.

Additional config for Taskmanager

The settings mentioned above are commonly used for both JobManager and TaskManager. In addition, TaskManager offers extra configuration options that allow for more fine-tuned application performance.

Of course, this is not a mandatory setting. If no value is specified, the application will automatically assign an appropriate value at range of off-heap memory.

Image description

Network Memory

Network memory is used for buffering data during network communication between TaskManagers.
It is crucial for efficient data exchange in distributed processing, especially for shuffling operations and data streaming between tasks.

Network memory is configured using the taskmanager.memory.network.fraction, taskmanager.memory.network.min, and taskmanager.memory.network.max parameters.

  • Considerations: Adequate network memory is essential for high-throughput applications. Insufficient network memory can lead to backpressure and degraded performance.

Managed Memory

Managed memory is a portion of memory that Flink manages for specific operations like sorting, joining, and caching state.
It is used by Flink's internal algorithms and state backends, such as RocksDB, to efficiently manage large datasets.
Managed memory can be configured using the taskmanager.memory.managed.size or taskmanager.memory.managed.fraction parameters.

Properly configuring managed memory is important for operations that require significant memory, such as large joins or aggregations. It helps in optimizing performance by reducing disk I/O.

Reference

https://nightlies.apache.org/flink/flink-docs-release-1.19/docs/deployment/memory/mem_setup/

💖 💪 🙅 🚩
kination
kination

Posted on November 22, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Setting up memory for Flink - Configuration
dataengineering Setting up memory for Flink - Configuration

November 22, 2024