I want to drop all rows in the ratings
df where the team has no game. So not in the fixtures
df in HomeTeam
or AwayTeam
occur. following I tried:
fixtures = pd.DataFrame({'HomeTeam': ["Team1", "Team3", "Team5", "Team6"], 'AwayTeam': [
"Team2", "Team4", "Team6", "Team8"]})
ratings = pd.DataFrame({'team': ["Team1", "Team2", "Team3", "Team4", "Team5",
"Team6", "Team7", "Team8", "Team9", "Team10", "Team11", "Team12"], "rating": ["1,5", "0,2", "0,5", "2", "3", "4,8", "0,9", "-0,4", "-0,6", "1,5", "0,2", "0,5"]})
ratings = ratings[(ratings.team != fixtures.HomeTeam) &
(ratings.team != fixtures.AwayTeam)]
but I get the error message:
ValueError: Can only compare identically-labeled Series objects
what can i do to stop the error from occurring?
CodePudding user response:
Because both dataframes are not of equal size. You can use isin() instead.
ratings = ratings[~ratings.team.isin(fixtures.stack())]
#output
'''
team rating
6 Team7 0,9
8 Team9 -0,6
9 Team10 1,5
10 Team11 0,2
11 Team12 0,5
'''
Details:
print(fixtures.stack())
'''
0 HomeTeam Team1
AwayTeam Team2
1 HomeTeam Team3
AwayTeam Team4
2 HomeTeam Team5
AwayTeam Team6
3 HomeTeam Team6
AwayTeam Team8
dtype: object
'''
As you can see this returns all values in fixtures. Using the ~ operator in the isin function, we filter out those that do not contain these values.