Home > Net >  Count number of cases reaching a given deadline
Count number of cases reaching a given deadline

Time:11-15

I have a dataframe similar to below:

Case ID D1 D2 D3 D4
A Dec 2022 Feb 2023 May 2023 Jun 2024
B Jul 2020 May 2023 Aug 2024
C May 2019 Jul 2020 Dec 2021
D Jul 2020 Mar 2021 Apr 2021 Aug 2024
E May 2019 May 2023 Aug 2024
F Dec 2022 Feb 2023 May 2023 Aug 2024
G Dec 2022 Feb 2023 May 2023 Aug 2024

And I would like to have a dataframe with all dates (from min(date) of above dataframe to max(date)) as index and then number of cases who have a deadline(i.e. Dx) for a given date. So, visually I am expecting something similar to below:

enter image description here

Thanks

CodePudding user response:

I finally came to a solution, not sure if it is the most straight-forward way, but I think it works.

# reshape data from wide to long and add a column "has_deadline" as marker. 
df1 = df.set_index('Case ID').stack().reset_index(name='dates').rename(columns={'level_1': 'deadlines'})
df1['has_deadline'] = 1

# we need to create a new index which includes every single combination for each Case, deadline and date
# Then reindex the dataframe (missing values will be filled with NaN)
df1 = df1.set_index(['deadlines', 'dates', 'Case ID'])
new_idx = pd.MultiIndex.from_product(df1.index.levels)
df1 = df1.reindex(new_idx)

# Now we need to sum each row to get the sum of all cases for each deadline and date
# also sort the index chronological
df1 = df1.unstack(level=2).sum(axis=1).unstack('deadlines').sort_index(key=lambda x: pd.to_datetime(x, format="%b %Y"))
print(df1)

Output df1:

deadlines   D1   D2   D3   D4
dates                        
May 2019   2.0  0.0  0.0  0.0
Jul 2020   2.0  1.0  0.0  0.0
Mar 2021   0.0  1.0  0.0  0.0
Apr 2021   0.0  0.0  1.0  0.0
Dec 2021   0.0  0.0  1.0  0.0
Dec 2022   3.0  0.0  0.0  0.0
Feb 2023   0.0  3.0  0.0  0.0
May 2023   0.0  2.0  3.0  0.0
Jun 2024   0.0  0.0  0.0  1.0
Aug 2024   0.0  0.0  2.0  3.0

Plot:

# plot each column
for col in df1.columns:
    tmp = df1.loc[df1[col].ne(0), col]
    plt.scatter(x=tmp.index, y=tmp, label=col)
plt.xticks(rotation=90)
plt.xlabel('Dates')
plt.ylabel('No. of Cases')
plt.grid(True)
plt.legend()

enter image description here

  • Related