I would like to add a column to df2
that includes a count of rows in df1
that have matching Herd
and Ddat
values.
import pandas as pd
df1 = [[52, '1', '1/1/2020'], [54, '1', '1/1/2020'],
[55, '2', '1/1/2020'], [56, '3', '1/1/1999']]
df = pd.DataFrame(df1, columns =['Cow','Herd', 'Ddat'])
df2 = [['1', '1/1/2020'], ['1', '1/5/2020'],
['2', '1/1/2020'], ['3', '1/1/1999']]
df2 = pd.DataFrame(df2, columns =['Herd', 'Ddat'])
The output I am looking for is
Herd Ddat Count
1 1/1/2020 2
1 1/5/2020 0
2 1/1/2020 1
3 1/1/1999 1
CodePudding user response:
You can take advantage of the nice features of indexes:
cols = ['Herd', 'Ddat']
new_df = df2.set_index(cols).assign(Count=df.groupby(cols).count()).fillna(0).astype({'Count': int}).reset_index()
Output:
>>> new_df
Herd Ddat Count
0 1 1/1/2020 2
1 1 1/5/2020 0
2 2 1/1/2020 1
3 3 1/1/1999 1