Home > Mobile >  Apache Spark memory configuration with PySpark
Apache Spark memory configuration with PySpark

Time:06-27

I am working on an Apache Spark application on PySpark. I have looked for so many resources but could not understand a couple of things regarding memory allocation.

from pyspark.sql import SparkSession
from pyspark.sql.types import *

spark = SparkSession \
    .builder \
    .master("local[4]")\
    .appName("q1 Tutorial") \
    .getOrCreate()

I need to configure the memory, too. It will run locally and in client deploy mode. I read from some sources that in this case, I should not set up the driver memory, I only should set up executor memory. And some sources mentioned that in PySpark I should not configure driver-memory and executor memory.

Could you please give me information about memory config in PySpark or share me some reliable resources?

Thanks in advance!

CodePudding user response:

Driver memory can be configured via spark.driver.memory.

Executor memory can be configured with a combination of spark.executor.memory that sets the total amount of memory available to each executor, as well as spark.memory.fraction which splits the executor's memory between execution vs storage memory.

Note that 300 MB of executor memory is automatically reserved to safeguard against out-of-memory errors.

CodePudding user response:

Most of the computational work is performed on spark executers but when we run operations like collect() or take() then data is transferred to Spark driver.

it is always recommended to use collect() and take() lesser or for lesser data so that it wont be a overhead on driver. But in case if you have requirement where you have to show large amount of Data using collect() or take() then you have to increase the driver memory to avoid OOM exception.

ref : Spark Driver Memory calculation

  • Related