Does coalesce(1) bring all the data to the driver?-CodePudding

The difference between coalesce and repartition is fairly straightforward. If I were to coalesce a DataFrame to 1 partition and write it to a storage service (Azure Blob/ AWS S3 etc), would the entire DataFrame be sent to the driver and then to the storage service; or would an executor send it directly?

CodePudding user response：

The Spark official documentation describes it as follows:

If you’re doing a drastic coalesce, e.g. to numPartitions = 1, this may result in your computation taking place on fewer nodes than you like (e.g. one node in the case of numPartitions = 1).

From the above it can be inferred that it should be an executor send it directly.