I have question i want to sequentially write many dataframe in avro format and i use the code below in a for loop.
df
.repartition(<number-of-partition>)
.write
.mode(<write-mode>)
.avro(<file-path>)
The problem is when i run my spark job , I see at a time only one task is getting executed (so , only 1 data frame is getting written) . Also when I checked the number of active executors in the spark-ui , I see only 1 executor is being used.
Is it possible to write dataframe sequentially in spark? If yes am i doing it the good way?
CodePudding user response:
To run multiple parallel jobs, you need to submit them from separate threads:
Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads. By “job”, in this section, we mean a Spark action (e.g. save, collect) and any tasks that need to run to evaluate that action.