What is the significance of nullable?
case class StructField(
name: String,
dataType: DataType,
nullable: Boolean = true,
metadata: Metadata = Metadata.empty) {
From documentation,
StructField(name, dataType, nullable): Represents a field in a StructType. The name of a field is indicated by name. The data type of a field is indicated by dataType. nullable is used to indicate if values of this fields can have null values.
Is it for only indication? Because I can't see it is enforcing the not null value (or am I missing something ?)
Program :
val cols = "firstName:String:false,middlename:String:true,lastName:String:false,zipCode:String:false,sex:String:false,salary:Int:true"
def inferType(field: String): StructField = {
val splits = field.split(":")
val colName = splits(0)
val nullable = splits(2).toBoolean
val dataType = splits(1).toUpperCase() match {
case "INT" => IntegerType
case "DOUBLE" => DoubleType
case "STRING" => StringType
case _ => StringType
}
StructField(colName, dataType, nullable)
}
val schema: StructType = StructType(cols
.split(",")
.map(col => inferType(col)))
val simpleData = Seq(
Row("Soumya","","Kole","36636","M",-1),
Row("Foo","Bar","","","",9000)
)
val rdd = spark.sparkContext.parallelize(simpleData)
val df = spark.createDataFrame(rdd, schema)
df.printSchema()
df.show()
Output:
root
|-- firstName: string (nullable = false)
|-- middlename: string (nullable = true)
|-- lastName: string (nullable = false)
|-- zipCode: string (nullable = false)
|-- sex: string (nullable = false)
|-- salary: integer (nullable = true)
--------- ---------- -------- ------- --- ------
|firstName|middlename|lastName|zipCode|sex|salary|
--------- ---------- -------- ------- --- ------
| Soumya| | Kole| 36636| M| -1|
| Foo| Bar| | | | 9000|
--------- ---------- -------- ------- --- ------
CodePudding user response:
The blanks are empty strings
, not NULLs
.They are different.