I have the following sample data
df1 = [[52, '1', '10'], [54, '1', '4'],
[55, '2', '3'], [52, '1', '10'],
[55, '2', '10'], [52, '1', '4']]
df = pd.DataFrame(df1, columns =['Cow', 'Lact', 'Procedure'])
df2 = [['52', '1'], ['53', '9'],
['54', '2'], ['55', '2']]
df2 = pd.DataFrame(df2, columns =['Cow', 'Lact'])
The tables look like:
df
:
Cow Lact Procedure
0 52 1 10
1 54 1 4
2 55 2 3
3 52 1 10
4 55 2 10
5 52 1 4
df2
:
Cow Lact
0 52 1
1 53 9
2 54 2
3 55 2
I would like to count the number of procedure = 10
for each Cow-Lact
combination in df2
and then add a column to df2
called Tproc
that includes the count.
The output I am looking for is
Cow Lact Tproc
0 52 1 2
1 53 9 0
2 54 2 0
3 55 2 1
The following filter does not work:
filt = [(df['Cow']==df2['Cow'])&(df['Lact']==df2['Lact'])&(df['Procedure']==10)]
My plan was then to use .len
to get a count
df2['Tproc'] = df2.loc[filt].len
How can you filter a DataFrame based on values in another DataFrame to count the number of rows that satisfy the condition?
CodePudding user response:
You can use value_counts
and reindex
your first dataframe before append the values to Tproc
column:
df2['Tproc'] = (
df1[df1['Procedure'] == '10'].value_counts(['Cow', 'Lact'])
.reindex(pd.MultiIndex.from_frame(df2[['Cow', 'Lact']]), fill_value=0).values
)
print(df2)
# Output
Cow Lact Tproc
0 52 1 2
1 53 9 0
2 54 2 0
3 55 2 1
Setup:
df1 = pd.DataFrame({'Cow': [52, 54, 55, 52, 55, 52],
'Lact': [1, 1, 2, 1, 2, 1],
'Procedure': ['10', '4', '3', '10', '10', '4']})
df2 = pd.DataFrame({'Cow': [52, 53, 54, 55], 'Lact': [1, 9, 2, 2]})
CodePudding user response:
You could merge
groupby
sum
:
tmp = df2.merge(df.astype(str), on=['Cow','Lact'], how='left')
out = tmp['Procedure'].eq('10').groupby([tmp['Cow'], tmp['Lact']]).sum().reset_index(name='Tproc')
Output:
Cow Lact Tproc
0 52 1 2
1 53 9 0
2 54 2 0
3 55 2 1
CodePudding user response:
use groupby()
size()
and then merge()
out = df2.merge(
df[df['Procedure'] == '10'].groupby(['Cow', 'Lact']).size().reset_index(name='Tproc').astype(str),
how='left',
on=['Cow','Lact']
).fillna(0)
output:
Cow Lact Tproc 0 52 1 2 1 53 9 0 2 54 2 0 3 55 2 1