I'm trying to convert all empty values in my spark Dataframe to null
using:
df.withColumn(colname, when(df.col(colname).equalTo(""), null)
.otherwise(df.col(colname)));
It's working but I'll have to do this for all the columns, is there any other way in java-spark where I could check for all the columns in dataframe and replace it with null
.
CodePudding user response:
If you want to apply that transformation to all your columns, you could use df.columns()
to list all the columns and use the same construct on all of them with a for loop or a stream
like below:
List<Column> list = Arrays
.stream(df.columns())
.map(colname -> functions
.when(df.col(colname).equalTo(""), null)
.otherwise(df.col(colname)))
.collect(Collectors.toList());
df.select(list.toArray(new Column[0]));
CodePudding user response:
You can use forEach loop over the dataframe columns and do replace with null operation.
Dataset<Row> ds = //Input dataframe;
Stream.of(ds.columns()).forEach(c -> ds.withColumn(c, when(col(c).equalTo(""), null).otherwise(col(c))));