In my project, I have a dataframe that goes a little something like this:
df = pd.DataFrame([
{'posteam': 'NYJ', 'defteam': 'BAL', 'penalty': 1, 'penalty_team': 'NYJ', 'penalty_yards': 10},
{'posteam': 'NYJ', 'defteam': 'BAL', 'penalty': 1, 'penalty_team': 'BAL', 'penalty_yards': 5},
{'posteam': 'BAL', 'defteam': 'NYJ', 'penalty': 0, 'penalty_team': None, 'penalty_yards': 0},
{'posteam': 'BAL', 'defteam': 'NYJ', 'penalty': 1, 'penalty_team': 'NYJ', 'penalty_yards': 15}])
I would like to aggregate by posteam
value, but also including penalty
and penalty_yards
values where the penalty_team
value equals the posteam value. The output would be something like
posteam | penalty | penalty_yards |
---|---|---|
NYJ | 2 | 25 |
BAL | 1 | 5 |
While the penalty would not have been a penalty by the possessing team, I would like to track total penalties and penalty yards by the team being aggregated.
Currently, my code is
df.groupby(['posteam']).agg({
'penalty': 'sum',
'penalty_yards': 'sum'
})
Which returns the following (where penalties are tracked solely by the possessing team)
posteam | penalty | penalty_yards |
---|---|---|
NYJ | 2 | 15 |
BAL | 1 | 15 |
(Keep in mind the difference is NYJ
is getting one of the BAL
penalties and BAL
is getting one of NYJ
's)
Is there an easy way to rope in a value given this type of situation? I tried making a helper column to track posteam's penalties, but that excluded values where the team got a penalty while playing as defteam
.
CodePudding user response:
that should fix it
df1 = df.groupby(['posteam']).agg({
'penalty': 'sum'
})
df2 = df.groupby(['penalty_team']).agg({
'penalty_yards': 'sum'
})
pd.merge(df1, df2, left_index=True, right_index=True)