Home > Enterprise >  How to update two columns in PySpark satisfying the same condition?
How to update two columns in PySpark satisfying the same condition?

Time:10-20

I have a table in which there are 4 columns: "ID", "FLAG_A", "FLAG_B", "FLAG_C". This is the SQL query I want to transform into PySpark, there are two conditions which I need to satisfy for both columns "FLAG_A" and "FLAG_B". How to do it in PySpark?

UPDATE STATUS_TABLE SET STATUS_TABLE.[FLAG_A] = "JAVA", 
STATUS_TABLE.FLAG_B = "PYTHON"
WHERE (((STATUS_TABLE.[FLAG_A])="PROFESSIONAL_CODERS") AND 
((STATUS_TABLE.FLAG_C) Is Null)); 

Is it possible to code this in a single statement by giving two conditions and satisfying the "FLAG_A" and "FLAG_B" columns in PySpark?

CodePudding user response:

I can't think of any way to rewrite this into a single statement which you thought of. I tried writing the UPDATE query inside Spark, but it seems UPDATE is not working:

: java.lang.UnsupportedOperationException: UPDATE TABLE is not supported temporarily.

The following does exactly the same as your UPDATE query:

Input:

from pyspark.sql import functions as F
df = spark.createDataFrame(
    [(1, 'PROFESSIONAL_CODERS', 'X', None),
     (2, 'KEEP', 'KEEP', 'KEEP')],
    ['ID', 'FLAG_A', 'FLAG_B', 'FLAG_C'])

Script:

cond = (F.col('FLAG_A') == 'PROFESSIONAL_CODERS') & F.isnull('FLAG_C')
df = df.withColumn('FLAG_B', F.when(cond, 'PYTHON').otherwise(F.col('FLAG_B')))
df = df.withColumn('FLAG_A', F.when(cond, 'JAVA').otherwise(F.col('FLAG_A')))

df.show()
#  --- ------ ------ ------ 
# | ID|FLAG_A|FLAG_B|FLAG_C|
#  --- ------ ------ ------ 
# |  1|  JAVA|PYTHON|  null|
# |  2|  KEEP|  KEEP|  KEEP|
#  --- ------ ------ ------ 
  • Related