Home > OS >  PySpark set local dir to avoid java.io.IOException: No space left on device
PySpark set local dir to avoid java.io.IOException: No space left on device

Time:10-01

Spark is throwing the error java.io.IOException: No space left on device which I have traced to the overflow of the directory /tmp where Spark is creating temporary files. I want to manually specify another location for these files where more space is available. Currently using PySpark 3.1.2 on Ubuntu 20.04. Have tried the following already without success (Spark still writes to /tmp):

from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
spark.conf.set('spark.local.dirs', '/home/tmp')

have also tried instead

spark.conf.set('spark.local.dir', '/home/tmp')

In both cases Spark is ignoring the configuration change to write to another dir (/home/tmp) and instead is writing to the default dir as if no configuration change (/tmp) where there is not sufficient space.

CodePudding user response:

You can't do it from inside the spark session - since the Spark session was already created, so the local dir was already set (and used). You should pass it as parameter when starting:

spark = SparkSession.builder.config('spark.local.dir', '/home/tmp').getOrCreate()

CodePudding user response:

Set the desired temporary directory in a system environment variable SPARK_LOCAL_DIRS

  • Related