Home > Mobile >  Does coalesce(1) bring all the data to the driver?
Does coalesce(1) bring all the data to the driver?

Time:07-28

The difference between coalesce and repartition is fairly straightforward. If I were to coalesce a DataFrame to 1 partition and write it to a storage service (Azure Blob/ AWS S3 etc), would the entire DataFrame be sent to the driver and then to the storage service; or would an executor send it directly?

CodePudding user response:

The Spark official documentation describes it as follows:

If you’re doing a drastic coalesce, e.g. to numPartitions = 1, this may result in your computation taking place on fewer nodes than you like (e.g. one node in the case of numPartitions = 1).

From the above it can be inferred that it should be an executor send it directly.

  • Related