In this dataframe:
I want to find number of home wins and number of home matches for each team in each season. This is my code:
viewmatches.groupby(["season","home_team"]).apply(lambda x:x[x["home_team_goal"]>x["away_team_goal"]]["home_team"].count()).reset_index(name="home_wins")
viewmatches.groupby(["season","home_team"]).apply(lambda x:x["home_team"].count()).reset_index(name="home_matches")
result for the first line is this:
and the result for the second line is this:
So my question is how can I use groupby agg function to get this result:
CodePudding user response:
I'm not sure how this could be done using an aggregation function, but it seems like what you want to achieve is to just join the tables. This can be done like so:
A = pd.DataFrame({'K':list(range(4)), 'M':list(range(4)), 'A':['a','b','c', 'd']})
B = pd.DataFrame({'K':list(range(4)), 'M':list(range(4)), 'B':['r','x','y', 'z']})
print(A.join(B.set_index(['K', 'M']), on=['K', 'M']))
K M A B
0 0 0 a r
1 1 1 b x
2 2 2 c y
3 3 3 d z
CodePudding user response:
Here's a way to use agg
to get both desired columns at once:
x = viewmatches.assign(
home_wins=viewmatches["home_team_goal"]>viewmatches["away_team_goal"],
home_matches=1)[["season","home_team","home_wins","home_matches"]].groupby(
["season","home_team"]).agg(sum).reset_index()
Note that this is equivalent to using sum()
in place of agg(sum)
:
x = viewmatches.assign(
home_wins=viewmatches["home_team_goal"]>viewmatches["away_team_goal"],
home_matches=1)[["season","home_team","home_wins","home_matches"]].groupby(
["season","home_team"]).sum().reset_index()
Explanation:
- create the
home_wins
andhome_matches
columns in the original dataframe - call
groupby
andsum
- use
reset_index
to reassignseason
andhome_team
from index levels to columns
Full test code:
import pandas as pd
viewmatches = pd.DataFrame({
'season':[1,1,1,1,1]*3 [2,2,2,2,2]*3,
'home_team':['1. FC Koln', 'ADO Den Haag', 'AJ Auxerre', 'AS Monaco', 'AS Nancy-Lorraine']*3 ['1. FC Koln', 'ADO Den Haag', 'AJ Auxerre', 'AS Monaco', 'AS Nancy-Lorraine']*3,
'home_team_goal':[0,0,0,0,0] [1,1,1,1,1] [2,2,2,2,2] [3,3,3,3,3] [4,4,4,4,4] [5,5,5,5,5],
'away_team_goal':[0,1,2,3,4]*6,
'other_column_not_needed_in_result_1':[None]*30,
'other_column_not_needed_in_result_2':[None]*30,
'other_column_not_needed_in_result_3':[None]*30
})
print(viewmatches)
x = viewmatches.assign(
home_wins=viewmatches["home_team_goal"]>viewmatches["away_team_goal"],
home_matches=1)[["season","home_team","home_wins","home_matches"]].groupby(
["season","home_team"]).agg(sum).reset_index()
print(x)
Input:
season home_team home_team_goal away_team_goal other_col_1 other_col_2 other_col_3
0 1 1. FC Koln 0 0 None None None
1 1 ADO Den Haag 0 1 None None None
2 1 AJ Auxerre 0 2 None None None
3 1 AS Monaco 0 3 None None None
4 1 AS Nancy-Lorraine 0 4 None None None
5 1 1. FC Koln 1 0 None None None
6 1 ADO Den Haag 1 1 None None None
7 1 AJ Auxerre 1 2 None None None
8 1 AS Monaco 1 3 None None None
9 1 AS Nancy-Lorraine 1 4 None None None
10 1 1. FC Koln 2 0 None None None
11 1 ADO Den Haag 2 1 None None None
12 1 AJ Auxerre 2 2 None None None
13 1 AS Monaco 2 3 None None None
14 1 AS Nancy-Lorraine 2 4 None None None
15 2 1. FC Koln 3 0 None None None
16 2 ADO Den Haag 3 1 None None None
17 2 AJ Auxerre 3 2 None None None
18 2 AS Monaco 3 3 None None None
19 2 AS Nancy-Lorraine 3 4 None None None
20 2 1. FC Koln 4 0 None None None
21 2 ADO Den Haag 4 1 None None None
22 2 AJ Auxerre 4 2 None None None
23 2 AS Monaco 4 3 None None None
24 2 AS Nancy-Lorraine 4 4 None None None
25 2 1. FC Koln 5 0 None None None
26 2 ADO Den Haag 5 1 None None None
27 2 AJ Auxerre 5 2 None None None
28 2 AS Monaco 5 3 None None None
29 2 AS Nancy-Lorraine 5 4 None None None
Output:
season home_team home_wins home_matches
0 1 1. FC Koln 2 3
1 1 ADO Den Haag 1 3
2 1 AJ Auxerre 0 3
3 1 AS Monaco 0 3
4 1 AS Nancy-Lorraine 0 3
5 2 1. FC Koln 3 3
6 2 ADO Den Haag 3 3
7 2 AJ Auxerre 3 3
8 2 AS Monaco 2 3
9 2 AS Nancy-Lorraine 1 3