Home > OS >  Why BroadcastExchange needs more driver memory?
Why BroadcastExchange needs more driver memory?

Time:10-21

When broadcasting, Spark can fail with the error org.apache.spark.sql.errors.QueryExecutionErrors#notEnoughMemoryToBuildAndBroadcastTableError (Spark 3.2.1):

enter image description here

Why BroadcastExchange needs more driver memory? Isn't broadcast sending data to all drivers? Why driver memory is a bottleneck?

Thanks.

CodePudding user response:

Unfortunately executor side broadcast joins are not yet supported in Spark (see SPARK-17556). Currently all data of the broadcasted dataset is collected in the driver first to build an in-memory hash table which is then distributed to workers. This can result in high memory pressure on the driver.

  • Related