Look at the python's official website, there are other ways to read HDFS file, why spark. Read. CSV (" CSV path ") this way doesn't work?
Attached: the complete error message:
WARN FileStreamSink: Error while & for metadata directory.
Traceback (the most recent call last) :
The File "& lt; Stdin>" , the line 1, the in & lt; module>
The File "/usr/local/lib/python3.5/dist - packages/pyspark/SQL/readwriter py", line 476, in the CSV
Return the self. _df (self. _jreader. CSV (self. _spark. _sc. _jvm. PythonUtils. ToSeq (path)))
The File "/usr/local/lib/python3.5/dist - packages/py4j/java_gateway py", line 1257, in __call__
Answer, self gateway_client, self target_id, self. Name)
The File "/usr/local/lib/python3.5/dist - packages/pyspark/SQL/utils. Py", line 63, deco in
Return (f * a, * * kw)
The File "/usr/local/lib/python3.5/dist - packages/py4j/protocol. Py", line 328, in get_return_value
Format (target_id, ". ", name), value)
Py4j. Protocol. Py4JJavaError: An error occurred while calling o40. CSV.
: Java. IO. IOException: Incomplete HDFS URI, no host: HDFS:///agriculture/historyClimate/59855. CSV
At org, apache hadoop. HDFS. DistributedFileSystem. The initialize (DistributedFileSystem. Java: 143)
At org, apache hadoop. Fs. FileSystem. CreateFileSystem (2669) FileSystem. Java:
At org, apache hadoop. Fs. FileSystem. Access the $200 (94) FileSystem. Java:
At org, apache hadoop. Fs. $Cache FileSystem. GetInternal (2703) FileSystem. Java:
At org, apache hadoop. Fs. $Cache FileSystem. Get (2685) FileSystem. Java:
At org, apache hadoop. Fs. FileSystem. Get (373) FileSystem. Java:
At org, apache hadoop. Fs. Path. GetFileSystem (295) Path. Java:
At org. Apache. Spark. SQL. Execution. The datasources. The DataSource $$$$apache org anonfun $spark $$SQL execution $$DataSource datasources $$checkAndGlobPathIfNecessary $1. Apply (547) the DataSource. Scala:
At org. Apache. Spark. SQL. Execution. The datasources. The DataSource $$$$apache org anonfun $spark $$SQL execution $$DataSource datasources $$checkAndGlobPathIfNecessary $1. Apply (545) the DataSource. Scala:
At the scala. Collection. TraversableLike $$$flatMap anonfun $1. Apply (TraversableLike. Scala: 241)
At the scala. Collection. TraversableLike $$$flatMap anonfun $1. Apply (TraversableLike. Scala: 241)
At the scala. Collection. Immutable. List. Foreach (392) List. Scala:
At the scala. Collection. TraversableLike $class. FlatMap (TraversableLike. Scala: 241)
At the scala. Collection. Immutable. List. FlatMap (355) List. Scala:
At org.apache.spark.sql.execution.datasources.DataSource.org $$spark apache $$$$$$checkAndGlobPathIfNecessary DataSource datasources execution SQL (545) the DataSource. Scala:
At org. Apache. Spark. SQL. Execution. The datasources. The DataSource. ResolveRelation (359) the DataSource. Scala:
The at org. Apache. Spark. SQL. DataFrameReader. LoadV1Source (DataFrameReader. Scala: 223)
At org. Apache. Spark. SQL. DataFrameReader. Load (211). DataFrameReader scala:
The at org. Apache. Spark. SQL. DataFrameReader. CSV (DataFrameReader. Scala: 618)
At sun. Reflect. NativeMethodAccessorImpl. Invoke0 (Native Method)
At sun. Reflect. NativeMethodAccessorImpl. Invoke (NativeMethodAccessorImpl. Java: 62)
At sun. Reflect. DelegatingMethodAccessorImpl. Invoke (43) DelegatingMethodAccessorImpl. Java:
The at Java. Lang. Reflect. Method. Invoke (498) Method. The Java:
The at py4j. Reflection. MethodInvoker. Invoke (MethodInvoker. Java: 244)
The at py4j. Reflection. ReflectionEngine. Invoke (ReflectionEngine. Java: 357)
At py4j. Gateway. Invoke (282) Gateway. Java:
At py4j.com mands. AbstractCommand. InvokeMethod (AbstractCommand. Java: 132)
At py4j.com mands. CallCommand. Execute (CallCommand. Java: 79)
The at py4j. GatewayConnection. Run (GatewayConnection. Java: 238)
at java.lang.Thread.run(Thread.java:748)
CodePudding user response:
This problem is solved, when to csv_path assignment, along with the IP and port number, such as: csv_path='HDFS://X.X.X.X: 8020/a/b/Arthur c. sv'. Thank you!