I am struggling trying convert a dataframe containing multiple datetime columns into a version with a daterange.
A smaller example would look somewhat like this:
So from:
key date_upgrade1 date_upgrade2
0 key1 2020-07-31 2020-08-31
1 key2 2020-05-31 NaT
to:
month key upgrade1 upgrade2
0 2020-06-01 key1 0 0
1 2020-07-01 key1 0 0
2 2020-08-01 key1 1 0
3 2020-09-01 key1 1 1
4 2020-06-01 key2 1 0
5 2020-07-01 key2 1 0
6 2020-08-01 key2 1 0
7 2020-09-01 key2 1 0
where upgrade1 and upgrade 2 are "0" if the upgrade has not yet been done and "1" if the upgrade was already performed.
Actually, I am quite lost at the moment. I managed to create the empty date_range dataframe. But I am not sure how to duplicate it for multiple keys and change all values after the upgrade date to 1.
Any help is appreciated.
CodePudding user response:
Answered the question myself in the meantime:
df = [
df1.assign(month=t)
for t in pd.date_range(first_date, now, freq='MS')
]
df = pd.concat(df)
df.reset_index(drop=True, inplace=True)
df["upgrade1"] = df.month.gt(df.date_upgrade1).astype('int')
df["upgrade2"] = df.month.gt(df.date_upgrade2).astype('int')
with df1 defined as the upper DataFrame.