I have two dataframes, one consists of people and scores, the other consists of each time one of the people did a thing.
df_people = pd.DataFrame(
{
'Name' : ['Angie', 'John', 'Joanne', 'Shivangi'],
'ID' : ['0021', '0022', '0023', '0024'],
'Code' : ['BHU', 'JNU', 'DU', 'BHU'],
}
)
df_actions = pd.DataFrame(
{
'ID' : ['0023', '0021', '0022', '0021'],
'Act' : ['Enter', 'Enter', 'Enter', 'Leave'],
}
)
I would like to create a column in df_people
that represents the count of each time they appear in df_actions
based on the shared 'ID'
column.
it would look like
Name | ID | Code | Count | |
---|---|---|---|---|
0 | Angie | 0023 | BHU | 1 |
1 | John | 0021 | JNU | 2 |
2 | Joanne | 0022 | DU | 1 |
3 | Shivan | 0024 | BHU | 0 |
I have tried just taking a value count and insterting that as a new column into df_people
but it seems very clunky.
Any advice would be much appreciated.
CodePudding user response:
Another option is Series.map
with Series.value_counts
new_df = df_people.assign(Count=df_people['ID'].map(df_actions['ID'].value_counts())
.fillna(0, downcast='infer'))
print(new_df)
Name ID Code Count
0 Angie 0021 BHU 2
1 John 0022 JNU 1
2 Joanne 0023 DU 1
3 Shivangi 0024 BHU 0
CodePudding user response:
Use first GroupBy.agg
to compute the counts, then merge
:
(df_people
.merge(df_actions.groupby('ID')['Act'].agg(count='count'),
left_on='ID', right_index=True, how='left')
.fillna({'count': 0}, downcast='infer')
)
output:
Name ID Code count
0 Angie 0021 BHU 2
1 John 0022 JNU 1
2 Joanne 0023 DU 1
3 Shivangi 0024 BHU 0