Home > Software design >  {DataFrameWriter CSV to HDFS file system} write data without partitioning
{DataFrameWriter CSV to HDFS file system} write data without partitioning

Time:04-29

here, df is our data frame in which we are having our output, As i'm using dataframewriter to write the whole output to directory, but all the data is getting partitioned as mentioned below..

$ hdfs dfs -ls /path to hdfs directory..

Found 4 items

-rw-r--r--   3 xxxxxx xxxxxxx          0 2022-04-28 23:19 path to hdfs directory../_SUCCESS

-rw-r--r--   3 xxxxxx xxxxxx        238 2022-04-28 23:19 path to hdfs directory../part-00000-4bc48c17-5c85-44be-bf34-3645d2b2e085-c000.csv

-rw-r--r--   3 xxxxxxx xxxxxxx    6204498 2022-04-28 23:19 path to hdfs directory../part-00043-4bc48c17-5c85-44be-bf34-3645d2b2e085-c000.csv

-rw-r--r--   3 xxxxxxx xxxxxxx    5875627 2022-04-28 23:19 path to hdfs directory../part-00191-4bc48c17-5c85-44be-bf34-3645d2b2e085-c000.csv

I want all the data into one single CSV file, is there any other option put in the code.. below

df.write.mode("overwrite").csv('path to hdfs directory', header = True, sep = ',')

the data is about 55k rows in the df.

CodePudding user response:

You can use coalesce(1) to make a single CSV file

df.coalesce(1).write.mode("overwrite").csv('path to hdfs directory', header = True, sep = ',')
  • Related