Home > other >  In python, whether can use spark. Read. CSV (& quot; CSV path") Read the HDFS CSV file format?
In python, whether can use spark. Read. CSV (& quot; CSV path") Read the HDFS CSV file format?

Time:10-13

In python3 interactive window, introducing pyspark package, and create SparkSession, can use spark. Read. CSV (" CSV path ") to read a CSV file format, but, if it is HDFS format CSV file is not, An error: py4j. Protocol. Py4JJavaError: An error occurred while calling o40. CSV.
Look at the python's official website, there are other ways to read HDFS file, why spark. Read. CSV (" CSV path ") this way doesn't work?

Attached: the complete error message:
WARN FileStreamSink: Error while & for metadata directory.
Traceback (the most recent call last) :
The File "& lt; Stdin>" , the line 1, the in & lt; module>
The File "/usr/local/lib/python3.5/dist - packages/pyspark/SQL/readwriter py", line 476, in the CSV
Return the self. _df (self. _jreader. CSV (self. _spark. _sc. _jvm. PythonUtils. ToSeq (path)))
The File "/usr/local/lib/python3.5/dist - packages/py4j/java_gateway py", line 1257, in __call__
Answer, self gateway_client, self target_id, self. Name)
The File "/usr/local/lib/python3.5/dist - packages/pyspark/SQL/utils. Py", line 63, deco in
Return (f * a, * * kw)
The File "/usr/local/lib/python3.5/dist - packages/py4j/protocol. Py", line 328, in get_return_value
Format (target_id, ". ", name), value)
Py4j. Protocol. Py4JJavaError: An error occurred while calling o40. CSV.
: Java. IO. IOException: Incomplete HDFS URI, no host: HDFS:///agriculture/historyClimate/59855. CSV
At org, apache hadoop. HDFS. DistributedFileSystem. The initialize (DistributedFileSystem. Java: 143)
At org, apache hadoop. Fs. FileSystem. CreateFileSystem (2669) FileSystem. Java:
At org, apache hadoop. Fs. FileSystem. Access the $200 (94) FileSystem. Java:
At org, apache hadoop. Fs. $Cache FileSystem. GetInternal (2703) FileSystem. Java:
At org, apache hadoop. Fs. $Cache FileSystem. Get (2685) FileSystem. Java:
At org, apache hadoop. Fs. FileSystem. Get (373) FileSystem. Java:
At org, apache hadoop. Fs. Path. GetFileSystem (295) Path. Java:
At org. Apache. Spark. SQL. Execution. The datasources. The DataSource $$$$apache org anonfun $spark $$SQL execution $$DataSource datasources $$checkAndGlobPathIfNecessary $1. Apply (547) the DataSource. Scala:
At org. Apache. Spark. SQL. Execution. The datasources. The DataSource $$$$apache org anonfun $spark $$SQL execution $$DataSource datasources $$checkAndGlobPathIfNecessary $1. Apply (545) the DataSource. Scala:
At the scala. Collection. TraversableLike $$$flatMap anonfun $1. Apply (TraversableLike. Scala: 241)
At the scala. Collection. TraversableLike $$$flatMap anonfun $1. Apply (TraversableLike. Scala: 241)
At the scala. Collection. Immutable. List. Foreach (392) List. Scala:
At the scala. Collection. TraversableLike $class. FlatMap (TraversableLike. Scala: 241)
At the scala. Collection. Immutable. List. FlatMap (355) List. Scala:
At org.apache.spark.sql.execution.datasources.DataSource.org $$spark apache $$$$$$checkAndGlobPathIfNecessary DataSource datasources execution SQL (545) the DataSource. Scala:
At org. Apache. Spark. SQL. Execution. The datasources. The DataSource. ResolveRelation (359) the DataSource. Scala:
The at org. Apache. Spark. SQL. DataFrameReader. LoadV1Source (DataFrameReader. Scala: 223)
At org. Apache. Spark. SQL. DataFrameReader. Load (211). DataFrameReader scala:
The at org. Apache. Spark. SQL. DataFrameReader. CSV (DataFrameReader. Scala: 618)
At sun. Reflect. NativeMethodAccessorImpl. Invoke0 (Native Method)
At sun. Reflect. NativeMethodAccessorImpl. Invoke (NativeMethodAccessorImpl. Java: 62)
At sun. Reflect. DelegatingMethodAccessorImpl. Invoke (43) DelegatingMethodAccessorImpl. Java:
The at Java. Lang. Reflect. Method. Invoke (498) Method. The Java:
The at py4j. Reflection. MethodInvoker. Invoke (MethodInvoker. Java: 244)
The at py4j. Reflection. ReflectionEngine. Invoke (ReflectionEngine. Java: 357)
At py4j. Gateway. Invoke (282) Gateway. Java:
At py4j.com mands. AbstractCommand. InvokeMethod (AbstractCommand. Java: 132)
At py4j.com mands. CallCommand. Execute (CallCommand. Java: 79)
The at py4j. GatewayConnection. Run (GatewayConnection. Java: 238)
at java.lang.Thread.run(Thread.java:748)

CodePudding user response:

This problem is solved, when to csv_path assignment, along with the IP and port number, such as: csv_path='HDFS://X.X.X.X: 8020/a/b/Arthur c. sv'. Thank you!