I want to initialize a dataframe where some rows have None/Null value in spark scala(version 3.2.1). How to do this ?
val df = spark.createDataFrame(
Seq((0, "a", true), (1, "b", true), (2, "c", false), (3, "a", false), (4, "a", None), (5, "c", false))
).toDF("id", "category1", "category2")
df.show()
I get this error:
UnsupportedOperationException: Schema for type Any is not supported
CodePudding user response:
That's because the nearest supertype of both Boolean
and Option[Nothing] (None)
is Any
, and spark doesn't support that. The only thing you need to do to make your code work is to wrap the booleans inside Option/Some, so there's no need to define struct types, spark can figure it out. This would work:
Seq((0, "a", Some(true)), (1, "b", Some(true)), (2, "c", Some(false)), (3, "a", Some(false)), (4, "a", None), (5, "c", Some(false)))
.toDF("id", "category1", "category2")
CodePudding user response:
I was able to achieve your required output using following code:
import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructType, StructField, BooleanType};
val data = Seq(Row(true), Row(null))
val schema = List(StructField("boolColName", BooleanType, true))
val df = spark.createDataFrame(spark.sparkContext.parallelize(data), StructType(schema))
df.show()
The true supplied to schema specifies if the column is nullable