I want to read data from HDFS with Flink in python I found it possible with Java or Scala : https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/dataset/formats/hadoop/
Indeed, Flink HDFS connector provides a Sink that writes partitioned files to any filesystem supported by Hadoop FileSystem.
I know I need to use InputFormat to try and specify that, but I cannot find a good guide to this in Python. there is no support to do that in python (pyFlink)
Please any help will be appreciated !!!
CodePudding user response:
I solved this with myself, just need to configure class_path of hadoop and create flink sql table ) WITH ( 'connector' = 'filesystem', 'path' = 'hdfs://namenode:9000/directory/', 'format' = 'json' )