Similar to this question I want to add a column to my pyspark DataFrame containing nothing but an empty map. If I use the suggested answer from that question, however, the type of the map is <null,null>
, unlike in the answer posted there.
from pyspark.sql.functions import create_map
spark.range(1).withColumn("test", create_map()).printSchema()
root
|-- test: map(nullable = false)
| |-- key: null
| |-- value: null (valueContainsNull = false)
I need an empty <string,string>
map. I can do it in Scala like so:
import org.apache.spark.sql.functions.typedLit
spark.range(1).withColumn("test", typedLit(Map[String, String]())).printSchema()
root
|-- test: map(nullable = false)
| |-- key: string
| |-- value: string (valueContainsNull = true)
How can I do it in pyspark? I am using Spark 3.01 with underlying Scala 2.12 on Databricks Runtime 7.3 LTS. I need the <string,string>
map because otherwise I can't save my Dataframe to parquet:
AnalysisException: Parquet data source does not support map<null,null> data type.;
CodePudding user response:
You can cast the map to the appropriate type creating the map using create_map
.
from pyspark.sql.functions import create_map
spark.range(1).withColumn("test", create_map().cast("map<string,string>")).printSchema()
root
|-- id: long (nullable = false)
|-- test: map (nullable = false)
| |-- key: string
| |-- value: string (valueContainsNull = true)