Any one can please help me to save a Pyspark Dataframe as csv file with multicharacter delimiter using Pandas /python.
Did a research and found to_csv of Pypspark/Pandas can take only 1 character delimiter and there is no option to provide multicharacter delimiter as separator.
dataframe.to_csv(file.csv, sep="@@") Error: delimiter must be 1-character string
Link - https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html
Please let me know if any one has implemented this kind of scenario.
CodePudding user response:
It appears that the pandas to_csv function only allows single-character delimiters/separators.
So, Use numpy-savetxt.
np.savetxt(file.dat, chunk_data.values, delimiter='~|', fmt='%s',encoding='utf-8')
Then you can convert it to csv
format.
CodePudding user response:
In pyspark, you can use option("delimiter", "@@")
for multi-character delimiter:
df = spark.createDataFrame([('Ash', 25), ('Bob', 30), ('Cat', 20)], ['name', 'age'])
[Out]:
---- ---
|name|age|
---- ---
| Ash| 25|
| Bob| 30|
| Cat| 20|
---- ---
df.repartition(1).write.mode("overwrite").option("header",True).option("delimiter", "@@").csv("/content/sample_data/test.csv")
[Out]:
name@@age
Ash@@25
Bob@@30
Cat@@20