I have the following dataframe in Python that lists team lineage for a Formula 1 team in that it identifies when teams change name and what they change to:
import pandas as pd
teams = pd.DataFrame({'type': {0: 'new', 1: 'change', 2: 'change', 3: 'change'},
'orig_event_id': {0: nan, 1: '2009-01-Q', 2: '2012-01-Q', 3: '2017-01-Q'},
'orig_team_id': {0: nan,
1: 'Tyrrell_Racing_Organisation',
2: 'Mercedes_AMG_Petronas_F1_Team',
3: 'Mercedes_AMG_Petronas_Motorsport'},
'event_id': {0: '1970-01-Q', 1: '2010-01-Q', 2: '2017-01-Q', 3: '2020-01-Q'},
'new_team_id': {0: 'Tyrrell_Racing_Organisation',
1: 'Mercedes_AMG_Petronas_F1_Team',
2: 'Mercedes_AMG_Petronas_Motorsport',
3: 'Mercedes_AMG_Petronas_F1_Team'}})
The 'orig_event_id' is the event in which the team name changes and the 'event_id' is how long the name lasts to, if that helps.
I also have this dataframe of drivers and races that I would like to change the data of:
driver = pd.DataFrame({'index': {0: 22258, 1: 22279, 2: 22300, 3: 22317, 4: 22337},
'driver_id': {0: 'Lewis_Hamilton',
1: 'Lewis_Hamilton',
2: 'Lewis_Hamilton',
3: 'Lewis_Hamilton',
4: 'Lewis_Hamilton'},
'team_id': {0: 'Mercedes_AMG_Petronas_Motorsport',
1: 'Mercedes_AMG_Petronas_Motorsport',
2: 'Mercedes_AMG_Petronas_Motorsport',
3: 'Mercedes_AMG_Petronas_Motorsport',
4: 'Mercedes_AMG_Petronas_Motorsport'},
'event_id': {0: '2018-01-R',
1: '2018-02-R',
2: '2018-03-R',
3: '2018-04-R',
4: '2018-05-R'},
'season': {0: 2018, 1: 2018, 2: 2018, 3: 2018, 4: 2018},
'stage': {0: 1, 1: 2, 2: 3, 3: 4, 4: 5}})
So essentially what I want is the team_id in the driver df to change from 'Mercedes_AMG_Petronas_Motorsport' to 'Mercedes_AMG_Petronas_F1_Team' and all of the team_id with 'Tyrrell_Racing_Organisation' to also change to 'Mercedes_AMG_Petronas_F1_Team' so that they're consistent from year-to-year.
Is there an easy way to do this? Let me know and appreciate any help!
CodePudding user response:
You can try
driver['team_id'] = driver['team_id'].map(dict(zip(teams['orig_team_id'], teams['new_team_id'])))
# Or
driver['team_id'] = driver['team_id'].map(teams.set_index('orig_team_id')['new_team_id'])
print(df)
index driver_id team_id event_id season stage
0 22258 Lewis_Hamilton Mercedes_AMG_Petronas_F1_Team 2018-01-R 2018 1
1 22279 Lewis_Hamilton Mercedes_AMG_Petronas_F1_Team 2018-02-R 2018 2
2 22300 Lewis_Hamilton Mercedes_AMG_Petronas_F1_Team 2018-03-R 2018 3
3 22317 Lewis_Hamilton Mercedes_AMG_Petronas_F1_Team 2018-04-R 2018 4
4 22337 Lewis_Hamilton Mercedes_AMG_Petronas_F1_Team 2018-05-R 2018 5
CodePudding user response:
I can't see a particularly easy or vectorized way to be sure of replacing every team ID with the latest corresponding team ID. You can write a function that iterates through the rows, creating a series that maps the original team IDs to latest team IDs.
def get_latest_team_ids(df):
# Make a dataframe of team ID changes in date order
df = df.sort_values("event_id") \
.loc[df["type"] == "change", ["orig_team_id", "new_team_id"]] \
.reset_index(drop=True)
# Replace earlier occurrences of a changed ID (e.g. A => B, B => C becomes A => C, B => C)
for i, (orig, new) in df[1:].iterrows():
df.loc[:i-1, "new_team_id"].replace(orig, new, inplace=True)
# Return it as a series mapping original ID to latest ID
return df.set_index("orig_team_id")["new_team_id"]
Then, use this series to map the team IDs in the driver dataframe, as in @Ynjxsjmh's answer.
team_mapping = get_latest_team_ids(teams)
driver["latest_team_id"] = driver["team_id"].map(team_mapping)