Home > database >  How to create spark dataframe with none/null value?
How to create spark dataframe with none/null value?

Time:10-21

I want to initialize a dataframe where some rows have None/Null value in spark scala(version 3.2.1). How to do this ?

val df = spark.createDataFrame(
  Seq((0, "a", true), (1, "b", true), (2, "c", false), (3, "a", false), (4, "a", None), (5, "c", false))
).toDF("id", "category1", "category2")
df.show()

I get this error:

UnsupportedOperationException: Schema for type Any is not supported

CodePudding user response:

That's because the nearest supertype of both Boolean and Option[Nothing] (None) is Any, and spark doesn't support that. The only thing you need to do to make your code work is to wrap the booleans inside Option/Some, so there's no need to define struct types, spark can figure it out. This would work:

Seq((0, "a", Some(true)), (1, "b", Some(true)), (2, "c", Some(false)), (3, "a", Some(false)), (4, "a", None), (5, "c", Some(false)))
  .toDF("id", "category1", "category2")

CodePudding user response:

I was able to achieve your required output using following code:

import org.apache.spark.sql.Row
import org.apache.spark.sql.types.{StructType, StructField, BooleanType};

val data = Seq(Row(true), Row(null))
val schema = List(StructField("boolColName", BooleanType, true))

val df = spark.createDataFrame(spark.sparkContext.parallelize(data), StructType(schema))
df.show()

The true supplied to schema specifies if the column is nullable

  • Related