Home > Mobile >  adding column to dataframe using groupby on Date column
adding column to dataframe using groupby on Date column

Time:01-14

Existing Dataframe :

Sr.No        Date         Tag    score
01         10-02-2022    pass     10
02         10-02-2022    fail     5
03         10-02-2022    pass     10
04         11-02-2022    grace    3
05         11-02-2022    pass     15
06         11-02-2022    pass     15

Expected Dataframe :

Sr.No   Date       Tag  score no_of_records pass_count fail_count grace_count pass_score_total 
01    10-02-2022  pass   10         3           2          1          0             20
02    10-02-2022  fail   5          
03    10-02-2022  pass   10
04    11-02-2022  grace  3          3           2          0          1             30
05    11-02-2022  pass   15
06    11-02-2022  pass   15

no_of_records implies the no. of records for particular day. how to proceed with groupby on Date to get the desired output..?

CodePudding user response:

First get sums for pass in s Series by aggregate sum, then get counts by Series.value_counts, join with crosstab:

s = df['score'].where(df['Tag'].eq('pass'), 0).groupby(df['Date']).sum()
                .rename('pass_score_total'))
s1 = df['Date'].value_counts().rename('no_of_records')

df1 = pd.concat([s1, pd.crosstab(df['Date'], df['Tag']).add_suffix('_count'), s], axis=1)
df = df.join(df1, on='Date')
print (df)
   Sr.No        Date    Tag  score  no_of_records  fail_count  grace_count  \
0      1  10-02-2022   pass     10              3           1            0   
1      2  10-02-2022   fail      5              3           1            0   
2      3  10-02-2022   pass     10              3           1            0   
3      4  11-02-2022  grace      3              3           0            1   
4      5  11-02-2022   pass     15              3           0            1   
5      6  11-02-2022   pass     15              3           0            1   

   pass_count  pass_score_total  
0           2                20  
1           2                20  
2           2                20  
3           2                30  
4           2                30  
5           2                30  

If need empty strings:

s = df['score'].where(df['Tag'].eq('pass'), 0).groupby(df['Date']).sum().rename('pass_score_total')
s1 = df['Date'].value_counts().rename('no_of_records')

df1 = pd.concat([s1, pd.crosstab(df['Date'], df['Tag']).add_suffix('_count'), s], axis=1)
df = df.join(df1, on='Date')
df[df1.columns] = df[df1.columns].mask(df['Date'].duplicated(), '', axis=1)
print (df)
   Sr.No        Date    Tag  score no_of_records fail_count grace_count  \
0      1  10-02-2022   pass     10             3          1           0   
1      2  10-02-2022   fail      5                                        
2      3  10-02-2022   pass     10                                        
3      4  11-02-2022  grace      3             3          0           1   
4      5  11-02-2022   pass     15                                        
5      6  11-02-2022   pass     15                                        

  pass_count pass_score_total  
0          2               20  
1                              
2                              
3          2               30  
4                              
5                              

CodePudding user response:

You can use pd.crosstab:

>>> (pd.crosstab(df['Date'], df['Tag']).add_suffix('_count')
       .assign(no_of_records=lambda x: x.sum(axis=1)).reset_index()
       .rename_axis(columns=None))

         Date  fail_count  grace_count  pass_count  no_of_records
0  10-02-2022           1            0           2              3
1  11-02-2022           0            1           2              3
  • Related