I have a dataframe similar to below:
Case ID | D1 | D2 | D3 | D4 |
---|---|---|---|---|
A | Dec 2022 | Feb 2023 | May 2023 | Jun 2024 |
B | Jul 2020 | May 2023 | Aug 2024 | |
C | May 2019 | Jul 2020 | Dec 2021 | |
D | Jul 2020 | Mar 2021 | Apr 2021 | Aug 2024 |
E | May 2019 | May 2023 | Aug 2024 | |
F | Dec 2022 | Feb 2023 | May 2023 | Aug 2024 |
G | Dec 2022 | Feb 2023 | May 2023 | Aug 2024 |
And I would like to have a dataframe with all dates (from min(date) of above dataframe to max(date)) as index and then number of cases who have a deadline(i.e. Dx) for a given date. So, visually I am expecting something similar to below:
Thanks
CodePudding user response:
I finally came to a solution, not sure if it is the most straight-forward way, but I think it works.
# reshape data from wide to long and add a column "has_deadline" as marker.
df1 = df.set_index('Case ID').stack().reset_index(name='dates').rename(columns={'level_1': 'deadlines'})
df1['has_deadline'] = 1
# we need to create a new index which includes every single combination for each Case, deadline and date
# Then reindex the dataframe (missing values will be filled with NaN)
df1 = df1.set_index(['deadlines', 'dates', 'Case ID'])
new_idx = pd.MultiIndex.from_product(df1.index.levels)
df1 = df1.reindex(new_idx)
# Now we need to sum each row to get the sum of all cases for each deadline and date
# also sort the index chronological
df1 = df1.unstack(level=2).sum(axis=1).unstack('deadlines').sort_index(key=lambda x: pd.to_datetime(x, format="%b %Y"))
print(df1)
Output df1
:
deadlines D1 D2 D3 D4
dates
May 2019 2.0 0.0 0.0 0.0
Jul 2020 2.0 1.0 0.0 0.0
Mar 2021 0.0 1.0 0.0 0.0
Apr 2021 0.0 0.0 1.0 0.0
Dec 2021 0.0 0.0 1.0 0.0
Dec 2022 3.0 0.0 0.0 0.0
Feb 2023 0.0 3.0 0.0 0.0
May 2023 0.0 2.0 3.0 0.0
Jun 2024 0.0 0.0 0.0 1.0
Aug 2024 0.0 0.0 2.0 3.0
Plot:
# plot each column
for col in df1.columns:
tmp = df1.loc[df1[col].ne(0), col]
plt.scatter(x=tmp.index, y=tmp, label=col)
plt.xticks(rotation=90)
plt.xlabel('Dates')
plt.ylabel('No. of Cases')
plt.grid(True)
plt.legend()