I have a Spark dataframe where I need to create a window partition column ("desired_output"). I simply want this conditional column to equal the "flag" column (0) until the first true or 1 and then forward fill true or 1 forward throughout the partition ("user_id"). I've tried many different window partition variations (rowsBetween) but to no avail. Any help or guidance is greatly appreciated.
Here are some use cases and their "desired output":
columns = ['user_id', 'date', 'flag', 'desired_outcome']
data = [\
('1','2022-01-01', 0, 0),\
('1','2022-01-02', 0, 0),\
('1','2022-01-03', 0, 0),\
('1','2022-01-04', 0, 0),\
('1','2022-01-05', 1, 1),\
('1','2022-01-06', 0, 1),\
('1','2022-01-07', 0, 1),\
('2','2022-01-01', 0, 0),\
('2','2022-01-02', 0, 0),\
('2','2022-01-03', 0, 0),\
('2','2022-01-04', 0, 0),\
('2','2022-01-05', 0, 0),\
('2','2022-01-06', 1, 1),\
('2','2022-01-07', 0, 1),\
('2','2022-01-08', 0, 1),\
('2','2022-01-09', 0, 1),\
('2','2022-01-10', 0, 1),\
('2','2022-01-11', 0, 1),\
('2','2022-01-12', 0, 1)]
sample_df = spark.createDataFrame(data, columns)
CodePudding user response:
We can solve this particular case (as there are just two flag
values 0
and 1
) by using a sliding max()
window.
data_sdf. \
withColumn('so_answer',
func.max('flag').over(wd.partitionBy('user_id').orderBy('date').rowsBetween(-sys.maxsize, 0))
). \
show()
# ------- ---------- ---- --------------- ---------
# |user_id| date|flag|desired_outcome|so_answer|
# ------- ---------- ---- --------------- ---------
# | 1|2022-01-01| 0| 0| 0|
# | 1|2022-01-02| 0| 0| 0|
# | 1|2022-01-03| 0| 0| 0|
# | 1|2022-01-04| 0| 0| 0|
# | 1|2022-01-05| 1| 1| 1|
# | 1|2022-01-06| 0| 1| 1|
# | 1|2022-01-07| 0| 1| 1|
# | 2|2022-01-01| 0| 0| 0|
# | 2|2022-01-02| 0| 0| 0|
# | 2|2022-01-03| 0| 0| 0|
# | 2|2022-01-04| 0| 0| 0|
# | 2|2022-01-05| 0| 0| 0|
# | 2|2022-01-06| 1| 1| 1|
# | 2|2022-01-07| 0| 1| 1|
# | 2|2022-01-08| 0| 1| 1|
# | 2|2022-01-09| 0| 1| 1|
# | 2|2022-01-10| 0| 1| 1|
# | 2|2022-01-11| 0| 1| 1|
# | 2|2022-01-12| 0| 1| 1|
# ------- ---------- ---- --------------- ---------