I have a PySpark Dataframe where I want to change the values of 2 column simultaneously based on the filter condition involving those 2 columns. I'll give an hypothetical example as I cannot share the data.
--- ----
| Id |Rank|
-- ---
| a | 5 |
| b | 7 |
| c | 8 |
| d | 1 |
| | 9 |
-- ---
Condition: when Id == " " and Rank == 9 then Id = "A1" and Rank = 0, Otherwise no change. Thanks!
CodePudding user response:
You can try to judge the two columns separately.
data = [
('a', 5),
('b', 7),
('c', 8),
('d', 1),
(' ', 8),
(' ', 9),
('e', 9)
]
df = spark.createDataFrame(data, ['id', 'rank'])
df = df.selectExpr(
'if((id = " " and rank = 9), "A3", id) as id',
'if((id = " " and rank = 9), 0, rank) as rank'
)
df.show(truncate=False)
# --- ----
# |id |rank|
# --- ----
# |a |5 |
# |b |7 |
# |c |8 |
# |d |1 |
# | |8 |
# |A3 |0 |
# |e |9 |
# --- ----