I have the following spark data frame.
Date_1 Value Date_2
20-10-2021 1 Date
20-10-2021 2 Date
21-10-2021 3 Date
23-10-2021 4 Date
I would like to fill Date_2
values by adding Date_1 (Value-1)
.
The output that I would like to see is the following.
Date_1 Value Date_2
20-10-2021 1 20-10-2021
20-10-2021 2 21-10-2021
21-10-2021 3 23-10-2021
23-10-2021 4 26-10-2021
I have tried this using pyspark.
import pyspark.sql.functions as F
df = df.withColumn('Date_2', F.date_add(df['Date_1'], (df['Value'] -1)).show()
But I am getting TypeError: Column is not iterable
.
Can anyone help with this?
CodePudding user response:
The signature of the function date_add is (col, int). Therefore, you cannot use directly df['Value']
.
try this :
df = df.withColumn('Date_2', F.expr("date_add(Date_1, Value -1)")).show()
CodePudding user response:
You would need to parse SQL function DATE_ADD like this:
(
df
.withColumn("Value", F.col("Value").cast("int"))
.withColumn("Date_2",
F.expr('DATE_ADD(Date_1, Value - 1)')
)
)
DATE_ADD(Date_1, Value - 1) will add to each row in Date_1 column value from column Value -1 (counting in days).
Additionally (if you don't have it done yet) Value columns should be INT. If you would have there for example DOUBLE type, AnalysisException occur.