Apache spark (pyspark), how to replace a value in a column of a row with another value from same column from a different row
df1.filter(F.col('LAST_NAME') == 'Maltster').withColumn("ANNUAL_HOUSEHOLD_INCOME", df1.filter(F.col('LAST_NAME') == 'Attiwill').select(F.col('ANNUAL_HOUSEHOLD_INCOME'))[0]).show()
I am trying to replace the 'ANNUAL_HOUSEHOLD_INCOME' value in the row with LAST_NAME=Malster with the 'ANNUAL_HOUSEHOLD_INCOME' value in the row with LAST_NAME=Attiwill.
For ex:
Before running the code, the table looks like:
--------- -----------------------
|LAST_NAME|ANNUAL_HOUSEHOLD_INCOME|
--------- -----------------------
|Maltster |20000 |
|Attiwill |100000 |
--------- -----------------------
After running the code the table should look like:
--------- -----------------------
|LAST_NAME|ANNUAL_HOUSEHOLD_INCOME|
--------- -----------------------
|Maltster |100000 |
|Attiwill |100000 |
--------- -----------------------
But when I run the above code the value is not being overwritten
CodePudding user response:
I think you should reread the docs, it should be:
df.withColumn('ANNUAL_HOUSEHOLD_INCOME', F.when(F.col('LAST_NAME')=='Maltster', F.lit(100000)).otherwise(F.col('ANNUAL_HOUSEHOLD_INCOME')))
CodePudding user response:
Adding to Jonathan's answer, the below code will give the correct output:
df1.withColumn('ANNUAL_HOUSEHOLD_INCOME', F.when(F.col('LAST_NAME')=='Maltster', df1.filter(F.col('LAST_NAME') == 'Attiwill').select(F.col('ANNUAL_HOUSEHOLD_INCOME')).collect()[0][0]).otherwise(F.col('ANNUAL_HOUSEHOLD_INCOME')))