Home > Net >  How many types of HDFS Clusters are there and what is the best way to connect to HDFS Cluster using
How many types of HDFS Clusters are there and what is the best way to connect to HDFS Cluster using

Time:02-11

I think the title pretty much sums up my requirement, I would appreciate it if anyone please post how many types of HDFS clusters (Kerberos, etc.) and also which is the best library that is used to connect to each type of cluster(s) using python.

Thank you

CodePudding user response:

There's only one type of HDFS distributed by the Apache Hadoop project. There are several Hadoop compatible file systems such as Amazon S3 or GlusterFS.

Kerberos is an authorization system, not a type of Hadoop Filesystem.

If you want robust Hadoop communication from Python, Pyspark would be ideal, otherwise you can interface with the WebHDFS APIs using several other Python libraries that you'd find with a simple search

  • Related