Spark
is throwing the error java.io.IOException: No space left on device
which I have traced to the overflow of the directory /tmp
where Spark
is creating temporary files. I want to manually specify another location for these files where more space is available. Currently using PySpark 3.1.2
on Ubuntu 20.04
. Have tried the following already without success (Spark
still writes to /tmp
):
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
spark.conf.set('spark.local.dirs', '/home/tmp')
have also tried instead
spark.conf.set('spark.local.dir', '/home/tmp')
In both cases Spark
is ignoring the configuration change to write to another dir (/home/tmp
) and instead is writing to the default dir as if no configuration change (/tmp
) where there is not sufficient space.
CodePudding user response:
You can't do it from inside the spark session - since the Spark session was already created, so the local dir was already set (and used). You should pass it as parameter when starting:
spark = SparkSession.builder.config('spark.local.dir', '/home/tmp').getOrCreate()
CodePudding user response:
Set the desired temporary directory in a system environment variable SPARK_LOCAL_DIRS