Home > Enterprise >  Significance of nullable in Spark dataframe StructField
Significance of nullable in Spark dataframe StructField

Time:10-26

What is the significance of nullable?

case class StructField(
    name: String,
    dataType: DataType,
    nullable: Boolean = true,
    metadata: Metadata = Metadata.empty) {

From documentation,

StructField(name, dataType, nullable): Represents a field in a StructType. The name of a field is indicated by name. The data type of a field is indicated by dataType. nullable is used to indicate if values of this fields can have null values.

Is it for only indication? Because I can't see it is enforcing the not null value (or am I missing something ?)

Program :

val cols = "firstName:String:false,middlename:String:true,lastName:String:false,zipCode:String:false,sex:String:false,salary:Int:true"
def inferType(field: String): StructField = {
  val splits = field.split(":")
  val colName = splits(0)
  val nullable = splits(2).toBoolean
  val dataType = splits(1).toUpperCase() match {
    case "INT" => IntegerType
    case "DOUBLE" => DoubleType
    case "STRING" => StringType
    case _ => StringType
  }
  StructField(colName, dataType, nullable)
}

val schema: StructType = StructType(cols
  .split(",")
  .map(col => inferType(col)))

val simpleData = Seq(
  Row("Soumya","","Kole","36636","M",-1),
  Row("Foo","Bar","","","",9000)
)
val rdd = spark.sparkContext.parallelize(simpleData)
val df = spark.createDataFrame(rdd, schema)
df.printSchema()
df.show()

Output:

root
 |-- firstName: string (nullable = false)
 |-- middlename: string (nullable = true)
 |-- lastName: string (nullable = false)
 |-- zipCode: string (nullable = false)
 |-- sex: string (nullable = false)
 |-- salary: integer (nullable = true)

 --------- ---------- -------- ------- --- ------ 
|firstName|middlename|lastName|zipCode|sex|salary|
 --------- ---------- -------- ------- --- ------ 
|   Soumya|          |    Kole|  36636|  M|    -1|
|      Foo|       Bar|        |       |   |  9000|
 --------- ---------- -------- ------- --- ------ 

CodePudding user response:

The blanks are empty strings, not NULLs.They are different.

  • Related