Home > Net >  Merging strings of people's names in pandas
Merging strings of people's names in pandas

Time:06-30

I have two datasets that I want to merge based off the persons name. One data set player_nationalities has their full name:

Player, Nationality
Kylian Mbappé, France
Wissam Ben Yedder, France
Gianluigi Donnarumma, Italy

The other dataset player_ratings shortens their first names with a full stop and keeps the other name(s).

Player, Rating
K. Mbappé, 93
W. Ben Yedder, 89
G. Donnarumma, 91

How do I merge these tables based on the column Player and avoid merging people with the same last name? This is my attempt:

df = pd.merge(player_nationality, player_ratings, on='Player', how='left')

Player, Nationality, Rating
K. Mbappé, France, NaN
W. Ben Yedder, France, NaN
G. Donnarumma, Italy, NaN

CodePudding user response:

You would need to normalize the keys in both DataFrames in order to merge them. One idea would be to create a function to process the full name in player_nationalities and merge on the processed value for player name. eg:

def convert_player_name(name):
    try:
        first_name, last_name = name.split(' ', maxsplit=1)
        return f'{first_name[0]}. {last_name}'
    except ValueError:
        return name

    

player_nationalities['processed_name'] = [convert_player_name(name) for name in player_nationalities['Player']]
df_merged = player_nationalities.merge(player_ratings, left_on='processed_name', right_on='Player')

[out]

               Player_x Nationality processed_name       Player_y  Rating
0         Kylian Mbappé      France      K. Mbappé      K. Mbappé      93
1     Wissam Ben Yedder      France  W. Ben Yedder  W. Ben Yedder      89
2  Gianluigi Donnarumma       Italy  G. Donnarumma  G. Donnarumma      91
  • Related