Home > Net >  Pyspark Conditional statement
Pyspark Conditional statement

Time:06-17

Below is the input dataframe.

 ----------- --------- ------------------ ---------------------- ----------- 
|  DATE    |  ID      |sal               |   vat                |     flag  |
 ----------- --------- ------------------ ---------------------- ------------
|10-may-2022|     1   |             1000.0|                  12.0       1   |
|12-may-2022|     2   |              50.0|                   6.0|       1   |
 ----------- --------- ------------------ ---------------------- ------------

I want to perfrom the below based on the flag column

If the flag column is 1, I will do the below.

df = srcdf.withColumn("sum",col("sal")*2)
display(df)

If the flag column is 2, I will do the below.

df = srcdf.withColumn("sum",col("sal")*4)
display(df)

Below is the code Im using.

flag = srcdf.select(col("flag"))

if flag == 1 :

df = srcdf.withColumn("sum",col("sal")*2)
display(df)

else:
df = srcdf.withColumn("sum",col("sal")*4)
display(df)

When I use the above, I am getting syntax error. Is there any other way I can achieve this using the pyspark conditional statements.

Thank you.

CodePudding user response:

Possible duplicate of this question.

You need to use when with (or without) otherwise from pyspark.sql.functions.

from pyspark.sql.functions import when, col
df = srcdf\
   .withColumn("sum", when(col("flag") == 1, col("sal") * 2)\
                     .when(col("flag") == 2, col("sal") * 4)
   )

OR

from pyspark.sql.functions import when, col
df = srcdf\
   .withColumn("sum", when(col("flag") == 1, col("sal") * 2)\
                     .otherwise(col("sal") * 4)
   )
  • Related