Did the user really ask for Exactly Once? Fault Tolerance

tspannhw

Timothy Spann. πŸ‡ΊπŸ‡¦

Posted on August 12, 2020

Did the user really ask for Exactly Once? Fault Tolerance

Exactly Once Requirements

It is very tricky and can cause performance degradation, if your user could just use at least once, then always go with that. Having data sinks like Kudu where you can do an upsert makes exactly once less needed.

https://docs.cloudera.com/csa/1.2.0/datastream-connectors/topics/csa-kafka.html

Apache Flink, Apache NiFi Stateless and Apache Kafka can participate in that.

For CDF Stream Processing and Analytics with Apache Flink 1.10 Streaming :

Both Kafka sources and sinks can be used with exactly once processing guarantees when checkpointing is enabled.

End-to-End Guaranteed Exactly-Once Record Delivery

The Data Source and Data Sink to need to support exactly-once state semantics and take part in checkpointing.

Data Sources

  • Apache Kafka - must have Exactly-Once selected, transactions enabled and correct driver.

Select : Semantic.EXACTLY_ONCE

Data Sinks

  • HDFS BucketingSink
  • Apache Kafka

For Kafka, please check the timeouts sync up to checkpoints. https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/connectors/kafka.html#kafka-producers-and-fault-tolerance

Reference

πŸ’– πŸ’ͺ πŸ™… 🚩
tspannhw
Timothy Spann. πŸ‡ΊπŸ‡¦

Posted on August 12, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related