Home > Blockchain >  Filling null and not null values as 0 and 1, respectively to only selected columns in pyspark datafr
Filling null and not null values as 0 and 1, respectively to only selected columns in pyspark datafr

Time:06-23

For a sample spark df:

sdf = ({'id_col': [25, 13, 15],
                   'x2': ['b', None, '2'],
                   'x3': [None, '0', '3'] })

which I would to have:

sdf = ({'id_col': [25, 13, 15],
                   'x2': [1, 0, 1],
                   'x3': [0, 1, 1] })

replace all null with 0, and notnull with 1.

Following the syntax from: Filling not null values as 1 in pyspark dataframe. it went trough all the columns of the sdf. For my case, I need to keep the values of id_col as is but replace with 1 and 0 on cols x2 and x3.

I initially tried the following:

I put the x2 and x3 cols in a list:

list = ['x2','x3']

then applied this:

col_selection = [when(col(c).isNull(),0).otherwise(1).alias(c) for c in sdf[[list]].columns]

I get the intended 0s and 1s, but I only get the x2 and x3 columns. When I called the sdf, the changes did not happen. If I have 17 columns to go through in a 25-column sdf, can you guide me on how to apply the for loop on only the 17 columns?

Thanks

CodePudding user response:

If possible to do one by one

from pyspark.sql import functions as F
import pandas as pd
sdf = pd.DataFrame({'id_col': [25, 13, 15],
                   'x2': ['b', None, '2'],
                   'x3': [None, '0', '3'] })
sdf=spark.createDataFrame(sdf)
sdf.withColumns({"x2":F.when(F.col("x2").isNull(),0).otherwise(1),"x3":F.when(F.col("x3").isNull(),0).otherwise(1)}).show()

#output
 ------ --- --- 
|id_col| x2| x3|
 ------ --- --- 
|    25|  1|  0|
|    13|  0|  1|
|    15|  1|  1|
 ------ --- --- 

Note-withColumns is only available in spark version >= 3.3.0

To do in loop

lis=['x2','x3']
for a in lis:
  sdf=sdf.withColumn(f"{a}",F.when(F.col(f"{a}").isNull(),0).otherwise(1))
sdf.show()

#output
 ------ --- --- 
|id_col| x2| x3|
 ------ --- --- 
|    25|  1|  0|
|    13|  0|  1|
|    15|  1|  1|
 ------ --- --- 
  • Related