Home > Mobile >  Dataproc HDFS file URIs
Dataproc HDFS file URIs

Time:09-28

I have a question how to get path/url to the file located in dataproc hdfs? I want to run a M/R job based on a file that located in dataproc hdfs.

CodePudding user response:

The followings are all valid HDFS URIs in a Dataproc cluster:

  1. hdfs://<master-hostname>:8020/<path-to-file>
  2. hdfs://<master-hostname>/<path-to-file>
  3. hdfs:///<path-to-file>

The 3rd one works, because by default in every node of a Dataproc cluster, the fs.defaultFS property is configured as hdfs://<master-hostname> in /etc/hadoop/conf/core-site.xml. And 8020 is the default NameNode port.

  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://<master-hostname></value>
    <description>
      The name of the default file system. A URI whose scheme and authority
      determine the FileSystem implementation. The uri's scheme determines
      the config property (fs.SCHEME.impl) naming the FileSystem
      implementation class. The uri's authority is used to determine the
      host, port, etc. for a filesystem.
    </description>
  </property>

You can run hadoop fs -ls <uri> on any node to list the files.

  • Related