Home > Mobile >  Conditional count of rows in a dataframe based on two columns in another dataframe
Conditional count of rows in a dataframe based on two columns in another dataframe

Time:12-12

I would like to add a column to df2 that includes a count of rows in df1 that have matching Herd and Ddat values.

import pandas as pd 
    
df1 = [[52, '1', '1/1/2020'], [54, '1', '1/1/2020'],
       [55, '2', '1/1/2020'], [56, '3', '1/1/1999']]
    
df = pd.DataFrame(df1, columns =['Cow','Herd', 'Ddat'])

df2 = [['1', '1/1/2020'], ['1', '1/5/2020'],
       ['2', '1/1/2020'], ['3', '1/1/1999']]
    
df2 = pd.DataFrame(df2, columns =['Herd', 'Ddat'])

The output I am looking for is

Herd    Ddat       Count
1     1/1/2020        2
1     1/5/2020        0
2     1/1/2020        1
3     1/1/1999        1

CodePudding user response:

You can take advantage of the nice features of indexes:

cols = ['Herd', 'Ddat']
new_df = df2.set_index(cols).assign(Count=df.groupby(cols).count()).fillna(0).astype({'Count': int}).reset_index()

Output:

>>> new_df
  Herd      Ddat  Count
0    1  1/1/2020      2
1    1  1/5/2020      0
2    2  1/1/2020      1
3    3  1/1/1999      1
  • Related