Home > Net >  Spark. Change Nan in array in dataframe column
Spark. Change Nan in array in dataframe column

Time:10-03

I need to change Nan to 0 in array which stores in column. Array always have the same size. Here the example:

Id Array column
1  [1,2,3]
2  [nan,4,nan]

should be:

Id Array column
1  [1,2,3]
2  [0,4,0]

Thanks for helping.

CodePudding user response:

you can use fillna function so it would look something like follow df_new = df_old.fillna(0)

CodePudding user response:

You can use the transform function in SQL expr.

import pyspark.sql.functions as F

......
df = df.withColumn('array_col', F.expr('transform(array_col, x -> if(isnan(x), 0, x))'))
  • Related