Write multiple Avro files from pyspark to the same directory-CodePudding

I'm trying to write out dataframe as Avro files from PySpark dataframe to the path /my/path/ to HDFS, and partition by the col 'partition', so under /my/path/ , there should be the following sub directory structures

partition= 20230101
partition= 20230102
....

Under these sub directories, there should be the avro files. I'm trying to use

df1.select("partition","name","id").write.partitionBy("partition").format("avro").save("/my/path/")

It succeed the first time with , but when I tried to write another df with a new partition, it failed with error : path /my/path/ already exist. How should I achieve this? Many thanks for your help. The df format is as below:

partition  name   id 
20230101.   aa.   10 ---this row is the content in the first df
20230102.   bb.   20 ---this row is the content in the second df

CodePudding user response：

You should change the SaveMode. By default save mode is ErrorIfExists, so you are getting an error. Change it to append mode.

df1.select("partition","name","id") \
  .write.mode("append").format("avro")\
  .partitionBy("partition").save("/my/path/")