Home > Enterprise >  Java Spark convert Empty values in Dataframe to null
Java Spark convert Empty values in Dataframe to null

Time:12-03

I'm trying to convert all empty values in my spark Dataframe to null using:

df.withColumn(colname, when(df.col(colname).equalTo(""), null)
                                      .otherwise(df.col(colname)));

It's working but I'll have to do this for all the columns, is there any other way in java-spark where I could check for all the columns in dataframe and replace it with null.

CodePudding user response:

If you want to apply that transformation to all your columns, you could use df.columns() to list all the columns and use the same construct on all of them with a for loop or a stream like below:

List<Column> list = Arrays
    .stream(df.columns())
    .map(colname -> functions
                .when(df.col(colname).equalTo(""), null)
                .otherwise(df.col(colname)))
    .collect(Collectors.toList());

df.select(list.toArray(new Column[0]));

CodePudding user response:

You can use forEach loop over the dataframe columns and do replace with null operation.

Dataset<Row> ds = //Input dataframe;

Stream.of(ds.columns()).forEach(c -> ds.withColumn(c, when(col(c).equalTo(""), null).otherwise(col(c))));
  • Related