I think the title pretty much sums up my requirement, I would appreciate it if anyone please post how many types of HDFS clusters (Kerberos, etc.) and also which is the best library that is used to connect
to each type of cluster(s) using python.
Thank you
CodePudding user response:
There's only one type of HDFS distributed by the Apache Hadoop project. There are several Hadoop compatible file systems such as Amazon S3 or GlusterFS.
Kerberos is an authorization system, not a type of Hadoop Filesystem.
If you want robust Hadoop communication from Python, Pyspark would be ideal, otherwise you can interface with the WebHDFS APIs using several other Python libraries that you'd find with a simple search