I'm using pandas and my data frame goes like this:
Unnamed: 0 id player_name games time goals xG \
0 0 647 Harry Kane 35 3097 23 22.174859
1 1 1250 Mohamed Salah 37 3085 22 20.250847
2 2 1228 Bruno Fernandes 37 3117 18 16.019454
3 3 453 Son Heung-Min 37 3139 17 11.023287
4 4 822 Patrick Bamford 38 3085 17 18.401863
.. ... ... ... ... ... ... ...
assists xA shots key_passes yellow_cards red_cards position \
0 14 7.577094 138 49 1 0 F
1 5 6.528526 126 55 0 0 F M S
2 12 11.474996 121 95 6 0 M S
3 10 9.512992 68 75 0 0 F M S
4 7 3.782247 107 30 3 0 F S
.. ... ... ... ... ... ... ...
team_title npg npxG xGChain xGBuildup
0 Tottenham 19 19.130183 24.995648 4.451257
1 Liverpool 16 15.683834 28.968234 9.800236
2 Manchester United 9 8.407840 26.911412 11.932285
3 Tottenham 16 10.262118 20.671916 6.608751
4 Leeds 15 16.879525 23.394953 4.131796
.. ... ... ... ... ...
I'm trying to group it by team_title
and sort it by goals
and assists
in which will become something like this:
team_title player_name goals assists
Tottenham Harry Kane 23 14
Tottenham Gareth Bale ... ...
Tottenham Son Heung-Min ... ...
I've tried using
if any('Tottenham' in db['team_title']):
db.groupby('team_title')['player_name',''goals','assists'].value_counts()
and the error message I'm getting is
TypeError: 'bool' object is not iterable
is it correct using if-else
or are there any other way to sort from specific value of string?
CodePudding user response:
If I have understand what you want to do correctly, you want to sort_values() in the DataFrame by team name (therefore grouping them all together), then by goals and assists (in this order) with highest first.
You could do this with this line of code:
db.sort_values(by=["team_title", "goals", "assists"], ascending=[True, False, False], inplace=True)
An example of its use:
import pandas as pd
db = pd.DataFrame(data={"id": [0, 1, 2, 3, 4, 5], "player_name": ["Kane", "Salag", "Fernandes", "Heung-Min", "Bamford", "PlayerX"],
"games": [35, 37, 37, 37, 38, 35], "time": [3097, 2085, 3117, 3139, 3085, 4000],
"goals": [23, 22, 18, 17, 17, 12], "assists": [14, 5, 12 ,10, 7, 9],
"team_title": ["Tottenham", "Liverpool", "Manchester United", "Tottenham", "Leeds", "Tottenham"]})
db.sort_values(by=["team_title", "goals", "assists"], ascending=[True, False, False], inplace=True)
By giving a list to the ascending, you can give the direction for each item you are sorting by, so in this example:
- Sort by team_title in ascending order (which will group all teams together).
- Sort by number of goals scored in descending order (highest first).
- Lastly, sort by number of assists in descending order.
Output with example data:
#Out:
# id player_name games time goals assists team_title
#4 4 Bamford 38 3085 17 7 Leeds
#1 1 Salag 37 2085 22 5 Liverpool
#2 2 Fernandes 37 3117 18 12 Manchester United
#0 0 Kane 35 3097 23 14 Tottenham
#3 3 Heung-Min 37 3139 17 10 Tottenham
#5 5 PlayerX 35 4000 12 9 Tottenham
If you then wanted to retrieve specifically "Tottenham" data:
print(db[db["team_title"] == "Tottenham"])
#Out:
# id player_name games time goals assists team_title
#0 0 Kane 35 3097 23 14 Tottenham
#3 3 Heung-Min 37 3139 17 10 Tottenham
#5 5 PlayerX 35 4000 12 9 Tottenham
Your if statement doesn't work because you are effectively writing if any(True):
, but any()
doesn't accept a bool as an input. See this GeeksforGeeks page.