Home > Software engineering >  How to write in CSV file without creating folder in pyspark?
How to write in CSV file without creating folder in pyspark?

Time:11-11

While writing in CSV file, automatically folder is created and then csv file with cryptic name is created, how to create this CSV with any specific name but without creating folder in pyspark not in pandas.

CodePudding user response:

That's just the way Spark works with the parallelizing mechanism. Spark application meant to have one or more workers to read your data and to write into a location. When you write a CSV file, having a directory with multiple files is the way multiple workers can write at the same time.

If you're using HDFS, you can consider writing another bash script to move or reorganize files the way you want

If you're using Databricks, you can use dbutils.ls to interact with DBFS files in the same way.

CodePudding user response:

This is the way spark is designed to write out multiple files in parallel. Writing out many files at the same time is faster for big datasets. But still you can achieve by using of coalesce(1,true).saveAsTextFile() .You can refer here

  • Related