Existing Dataframe :
Sr.No Date Tag score
01 10-02-2022 pass 10
02 10-02-2022 fail 5
03 10-02-2022 pass 10
04 11-02-2022 grace 3
05 11-02-2022 pass 15
06 11-02-2022 pass 15
Expected Dataframe :
Sr.No Date Tag score no_of_records pass_count fail_count grace_count pass_score_total
01 10-02-2022 pass 10 3 2 1 0 20
02 10-02-2022 fail 5
03 10-02-2022 pass 10
04 11-02-2022 grace 3 3 2 0 1 30
05 11-02-2022 pass 15
06 11-02-2022 pass 15
no_of_records implies the no. of records for particular day. how to proceed with groupby on Date to get the desired output..?
CodePudding user response:
First get sums for pass
in s
Series by aggregate sum, then get counts by Series.value_counts
, join with crosstab
:
s = df['score'].where(df['Tag'].eq('pass'), 0).groupby(df['Date']).sum()
.rename('pass_score_total'))
s1 = df['Date'].value_counts().rename('no_of_records')
df1 = pd.concat([s1, pd.crosstab(df['Date'], df['Tag']).add_suffix('_count'), s], axis=1)
df = df.join(df1, on='Date')
print (df)
Sr.No Date Tag score no_of_records fail_count grace_count \
0 1 10-02-2022 pass 10 3 1 0
1 2 10-02-2022 fail 5 3 1 0
2 3 10-02-2022 pass 10 3 1 0
3 4 11-02-2022 grace 3 3 0 1
4 5 11-02-2022 pass 15 3 0 1
5 6 11-02-2022 pass 15 3 0 1
pass_count pass_score_total
0 2 20
1 2 20
2 2 20
3 2 30
4 2 30
5 2 30
If need empty strings:
s = df['score'].where(df['Tag'].eq('pass'), 0).groupby(df['Date']).sum().rename('pass_score_total')
s1 = df['Date'].value_counts().rename('no_of_records')
df1 = pd.concat([s1, pd.crosstab(df['Date'], df['Tag']).add_suffix('_count'), s], axis=1)
df = df.join(df1, on='Date')
df[df1.columns] = df[df1.columns].mask(df['Date'].duplicated(), '', axis=1)
print (df)
Sr.No Date Tag score no_of_records fail_count grace_count \
0 1 10-02-2022 pass 10 3 1 0
1 2 10-02-2022 fail 5
2 3 10-02-2022 pass 10
3 4 11-02-2022 grace 3 3 0 1
4 5 11-02-2022 pass 15
5 6 11-02-2022 pass 15
pass_count pass_score_total
0 2 20
1
2
3 2 30
4
5
CodePudding user response:
You can use pd.crosstab
:
>>> (pd.crosstab(df['Date'], df['Tag']).add_suffix('_count')
.assign(no_of_records=lambda x: x.sum(axis=1)).reset_index()
.rename_axis(columns=None))
Date fail_count grace_count pass_count no_of_records
0 10-02-2022 1 0 2 3
1 11-02-2022 0 1 2 3