I need to change Nan to 0 in array which stores in column. Array always have the same size. Here the example:
Id Array column
1 [1,2,3]
2 [nan,4,nan]
should be:
Id Array column
1 [1,2,3]
2 [0,4,0]
Thanks for helping.
CodePudding user response:
you can use fillna function so it would look something like follow df_new = df_old.fillna(0)
CodePudding user response:
You can use the transform
function in SQL expr
.
import pyspark.sql.functions as F
......
df = df.withColumn('array_col', F.expr('transform(array_col, x -> if(isnan(x), 0, x))'))