Consider this df
---- ------
|cond|chaine|
---- ------
| 0| TF1|
| 1| TF1|
| 1| TNT|
---- ------
I would like to apply this withColumn instruction but only on rows having cond == 1
:
df.withColumn("New", when($"chaine" === "TF1", "YES!"))
.withColumn("New2", when($"chaine" === "TF1", "YES2!"))
.withColumn("New3", when($"chaine" === "TF1", "YES3!"))
.withColumn("New4", when($"chaine" === "TF1", "YES4!"))
I can't use .filter
because I still want to have rows with cond =!= 1
in output.
I can do it by adding my condition inside every where in code:
df.withColumn("New", when($"chaine" === "TF1" AND $"cond" === 1, "YES!"))
.withColumn("New2", when($"chaine" === "TF1" AND $"cond" === 1, "YES2!"))
.withColumn("New3", when($"chaine" === "TF1" AND $"cond" === 1, "YES3!"))
.withColumn("New4", when($"chaine" === "TF1" AND $"cond" === 1, "YES4!"))
but the problem is that I have a lot of new columns and I want a better solution (like a global confition ?)
Thank you.
CodePudding user response:
Some simple syntactic ideas:
def whenCondIs(n: Int)(condition: Column, value: Any): Column =
when(condition && $"cond" === n, value)
def whenOne(condition: Column, value: Any): Column =
whenCondIs(1)(condition, value)
and then:
df.withColumn("New", whenOne($"chaine" === "TF1", "YES2!"))
.withColumn("New2", whenOne($"chaine" === "TF1", "YES2!"))
CodePudding user response:
You can have the mapping between conditions and the new columns to create, in a list and use foldLeft
to add them in into your dataframe. Something like this:
val newCols = Seq(
("New", "chaine='TF1'", "YES!"),
("New2", "chaine='TF1'", "YES2!"),
("New3", "chaine='TF1'", "YES3!"),
("New4", "chaine='TF1'", "YES4!")
)
val df1 = newCols.foldLeft(df)((acc, x) =>
acc.withColumn(x._1, when(expr(x._2) && col("cond")===1, lit(x._3)))
)