I'm trying to create a DataSet from the Dataframe using a case class.
case class test (language:String, users_count: String = "100")
-------- -----------
|language|users_count|
-------- -----------
| Java| 20000|
| Python| 100000|
| Scala| 3000|
-------- -----------
df.as[test]
How to handle the scenario where a column is missing in the dataframe ? The expectation is dataset populates default value provided in the case class.
If the dataframe only has one column, it throws an exception
org.apache.spark.sql.AnalysisException: cannot resolve '
users_count
' given input columns: [language];
Expected Result:
--------
|language|
--------
| Java|
| Python|
| Scala|
--------
df.as[test].collect(0)
test('Java',100) // where 100 is the default value
CodePudding user response:
You could use the map
function and explicitly call the constructor like this:
df
.map(row => test(row.getAs[String]("language")))
.show
-------- -----------
|language|users_count|
-------- -----------
| Java| 100|
| Python| 100|
| Scala| 100|
-------- -----------