I have a question how to get path/url to the file located in dataproc hdfs? I want to run a M/R job based on a file that located in dataproc hdfs.
CodePudding user response:
The followings are all valid HDFS URIs in a Dataproc cluster:
hdfs://<master-hostname>:8020/<path-to-file>
hdfs://<master-hostname>/<path-to-file>
hdfs:///<path-to-file>
The 3rd one works, because by default in every node of a Dataproc cluster, the fs.defaultFS
property is configured as hdfs://<master-hostname>
in /etc/hadoop/conf/core-site.xml
. And 8020
is the default NameNode port.
<property>
<name>fs.defaultFS</name>
<value>hdfs://<master-hostname></value>
<description>
The name of the default file system. A URI whose scheme and authority
determine the FileSystem implementation. The uri's scheme determines
the config property (fs.SCHEME.impl) naming the FileSystem
implementation class. The uri's authority is used to determine the
host, port, etc. for a filesystem.
</description>
</property>
You can run hadoop fs -ls <uri>
on any node to list the files.