Home > other >  How to set HDFS as statebackend for flink
How to set HDFS as statebackend for flink

Time:11-02

I want to store flink store in HDFS so that after crash I can recover the flink state from HDFS. I am planning to write state to HDFS every 60 second. How Can I achieve this ? Is this the config I need to follow ? https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/state_backends.html#setting-default-state-backend

And where do I specify the check point interval ? Any link or sample code would be helpful

CodePudding user response:

Choosing where checkpoints are stored (e.g., HDFS) is separate from deciding which state backend to use for managing your working state (which can be on-heap, or in local files managed by the RocksDB library).

These two concepts were cleanly separated in Flink 1.12. In early versions of Flink, the two appeared to be more strongly related than they actually are because the filesystem and rocksdb state backend constructors took a file URI as a parameter, specifying where the checkpoints should be stored.

The best way to manage all of this is to leave this out of your code, and to specify the configuration you want in flink-conf.yaml, e.g.,

state.backend: filesystem
state.checkpoints.dir: hdfs://namenode-host:port/flink-checkpoints
execution.checkpointing.interval: 10s

CodePudding user response:

Information about checkpointing and savepointing can be found at https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/checkpointing/

On how to configure HDFS as a filesystem, you should check https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/overview/

  • Related