I got some help in the past and was able to advance well. Now I have an additional need to create a summary dataset for the study. Please help me if you can.
This is my current Dataset: import pandas as pd
df2 = pd.DataFrame({'patient': ['one', 'one', 'one', 'three','three', 'two','two','two','two'],
'pattern': ['A', 'B', '000', 'C', 'A', '000','D','A','C'],
'date': ['11/20/2022', '11/22/2022', '11/23/2022', '11/8/2022', '11/9/2022', '11/14/2022','11/20/2022', '11/22/2022', '11/23/2022']})
m = df2['pattern'] == '000'
df2['result'] = (df2[~m].groupby(['patient', m.cumsum()])
.cumcount().add(1)
.reindex(df2.index, fill_value=0))
df2
From the above current Dataset, I like to create a summary dataset like shown below. Can you help me how extract the below summary dataset from the above dataset, please?
required_dataset = pd.DataFrame({'pattern': ['A,B', 'C,A','D,A,C'], ### Pattern happend by Date
'patients': [1,1,1]}) ### Total Number of unique patients
required_dataset
CodePudding user response:
IIUC, you can use GroupBy.agg
:
out = (
df2[~m].groupby("patient")
.agg(pattern= ("pattern", ",".join),
patients=("patient", "nunique"))
.reset_index(drop=True)
)
Output :
print(out)
pattern patients
0 A,B 1
1 C,A 1
2 D,A,C 1
CodePudding user response:
You can try doing this:
m = df2["pattern"] == "000"
summary_df = df2[~m].groupby("patient").agg(pattern = ("pattern", lambda x: ','.join(x)),
patients = ("patient", "count")).reset_index()
summary_df
Prints:
patient pattern patients
0 one A,B 2
1 three C,A 2
2 two D,A,C 3
The first pattern
column concatenates all the patterns with ' ,'.join(x)
.
The second patient
column counts the number of visits.
Granted, the required_dataframe
you provided had different values in this one, but I guess you meant visits. If not, let me know and I'll update my answer.