How to count rows in a dataframe where column values match column values in another dataframe-CodePudding

I have the following sample data

df1 = [[52, '1', '10'], [54, '1', '4'],
       [55, '2', '3'], [52, '1', '10'],
       [55, '2', '10'], [52, '1', '4']]
    
df = pd.DataFrame(df1, columns =['Cow', 'Lact', 'Procedure'])

df2 = [['52', '1'], ['53', '9'],
       ['54', '2'], ['55', '2']]
    
df2 = pd.DataFrame(df2, columns =['Cow', 'Lact'])

The tables look like:

df:

    Cow Lact    Procedure
0   52  1        10
1   54  1        4
2   55  2        3
3   52  1        10
4   55  2        10
5   52  1        4

df2:

I would like to count the number of procedure = 10 for each Cow-Lact combination in df2 and then add a column to df2 called Tproc that includes the count.

The output I am looking for is

    Cow Lact Tproc
0   52  1     2
1   53  9     0
2   54  2     0
3   55  2     1

The following filter does not work:

filt = [(df['Cow']==df2['Cow'])&(df['Lact']==df2['Lact'])&(df['Procedure']==10)]

My plan was then to use .len to get a count

df2['Tproc'] = df2.loc[filt].len

How can you filter a DataFrame based on values in another DataFrame to count the number of rows that satisfy the condition?

CodePudding user response：

You can use value_counts and reindex your first dataframe before append the values to Tproc column:

df2['Tproc'] = (
    df1[df1['Procedure'] == '10'].value_counts(['Cow', 'Lact'])
       .reindex(pd.MultiIndex.from_frame(df2[['Cow', 'Lact']]), fill_value=0).values
)
print(df2)

# Output
  Cow Lact  Tproc
0  52    1      2
1  53    9      0
2  54    2      0
3  55    2      1

Setup:

df1 = pd.DataFrame({'Cow': [52, 54, 55, 52, 55, 52],
                    'Lact': [1, 1, 2, 1, 2, 1],
                    'Procedure': ['10', '4', '3', '10', '10', '4']})

df2 = pd.DataFrame({'Cow': [52, 53, 54, 55], 'Lact': [1, 9, 2, 2]})

CodePudding user response：

You could merge groupby sum:

tmp = df2.merge(df.astype(str), on=['Cow','Lact'], how='left')
out = tmp['Procedure'].eq('10').groupby([tmp['Cow'], tmp['Lact']]).sum().reset_index(name='Tproc')

Output:

  Cow Lact  Tproc
0  52    1      2
1  53    9      0
2  54    2      0
3  55    2      1

CodePudding user response：

use groupby() size() and then merge()

out = df2.merge(
    df[df['Procedure'] == '10'].groupby(['Cow', 'Lact']).size().reset_index(name='Tproc').astype(str), 
    how='left', 
    on=['Cow','Lact']
).fillna(0)

output:

  Cow Lact Tproc
0 52  1   2
1 53  9   0
2 54  2   0
3 55  2   1