Suppose I have a pyspark dataframe df.
--- ---
| a| b|
--- ---
| 1| 2|
| 2| 3|
| 4| 5|
--- ---
I'd like to add new column c.
column c = max(0, column b - 100)
--- --- ---
| a| b| c|
--- --- ---
| 1|200|100|
| 2|300|200|
| 4| 50| 0|
--- --- ---
How should I generate the new column c in pyspark dataframe? Thanks in advance!
CodePudding user response:
Hope you are looking something like this:
from pyspark.sql.functions import col, lit, greatest
df = spark.createDataFrame(
[
(1, 200),
(2, 300),
(4, 50),
],
["a", "b"]
)
df_new = df.withColumn("c", greatest(lit(0), col("b")-lit(100)))
.show()