I have a DataFrame in pyspark following below:
id | Datetime | price | amountOld | amountNew |
---|---|---|---|---|
1 | 20210203 | 200 | 20 | 30 |
1 | 20210204 | 230 | None | None |
1 | 20210205 | 225 | None | None |
1 | 20210206 | 233 | 30 | 50 |
1 | 20210207 | 220 | None | None |
2 | 20210407 | 400 | None | None |
2 | 20210408 | 410 | 90 | 100 |
2 | 20210409 | 415 | None | None |
2 | 20210410 | 423 | None | None |
I want to add a column to that like below:
id | Datetime | price | amountOld | amountNew | currentAmount |
---|---|---|---|---|---|
1 | 20210203 | 200 | 20 | 30 | 30 |
1 | 20210204 | 230 | None | None | 30 |
1 | 20210205 | 225 | None | None | 30 |
1 | 20210206 | 233 | 30 | 50 | 50 |
1 | 20210207 | 220 | None | None | 50 |
2 | 20210407 | 400 | None | None | 90 |
2 | 20210408 | 410 | 90 | 100 | 100 |
2 | 20210409 | 415 | None | None | 100 |
2 | 20210410 | 423 | None | None | 100 |
I have tried many times like the code below but no luck:
w = Window.partitionBy("id").orderBy(df_final.Datetime.desc())
df_final = df_final. \
withColumn("currentAmount",
when(col("amountNew").isNotNull(),
lag(col("amountNew")).over(w)) \
.otherwise(lag(col("amountOld")).over(w)))
How can I do that? Thanks.
CodePudding user response:
we can use first
,last
from functions
over window
w1 = Window.partitionBy('id').orderBy('Datetime').rowsBetween(Window.unboundedPreceding,Window.currentRow)
w2 = Window.partitionBy('id').orderBy('Datetime').rowsBetween(Window.currentRow,Window.unboundedFollowing)
df_final = df_final.withColumn('currentAmount',f.coalesce(f.last('amountNew',ignorenulls=True).over(w1),f.first('amountOld',ignorenulls=True).over(w2)))
dff.show()
--- -------- ----- --------- --------- -------------
| id|Datetime|price|amountOld|amountNew|currentAmount|
--- -------- ----- --------- --------- -------------
| 1|20210203| 200| 20| 30| 30|
| 1|20210204| 230| null| null| 30|
| 1|20210205| 225| null| null| 30|
| 1|20210206| 233| 30| 50| 50|
| 1|20210207| 220| null| null| 50|
| 2|20210407| 400| null| null| 90|
| 2|20210408| 410| 90| 100| 100|
| 2|20210409| 415| null| null| 100|
| 2|20210410| 423| null| null| 100|
--- -------- ----- --------- --------- -------------