Home > Back-end >  can i write sequentially many dataframe in spark?
can i write sequentially many dataframe in spark?

Time:08-19

I have question i want to sequentially write many dataframe in avro format and i use the code below in a for loop.

df
  .repartition(<number-of-partition>)
  .write 
  .mode(<write-mode>)
  .avro(<file-path>)

The problem is when i run my spark job , I see at a time only one task is getting executed (so , only 1 data frame is getting written) . Also when I checked the number of active executors in the spark-ui , I see only 1 executor is being used.

Is it possible to write dataframe sequentially in spark? If yes am i doing it the good way?

CodePudding user response:

To run multiple parallel jobs, you need to submit them from separate threads:

Inside a given Spark application (SparkContext instance), multiple parallel jobs can run simultaneously if they were submitted from separate threads. By “job”, in this section, we mean a Spark action (e.g. save, collect) and any tasks that need to run to evaluate that action.

You can check Comparison of sequential and parallel writing

  • Related