I have two datasets that I want to merge based off the persons name. One data set player_nationalities
has their full name:
Player, Nationality
Kylian Mbappé, France
Wissam Ben Yedder, France
Gianluigi Donnarumma, Italy
The other dataset player_ratings
shortens their first names with a full stop and keeps the other name(s).
Player, Rating
K. Mbappé, 93
W. Ben Yedder, 89
G. Donnarumma, 91
How do I merge these tables based on the column Player and avoid merging people with the same last name? This is my attempt:
df = pd.merge(player_nationality, player_ratings, on='Player', how='left')
Player, Nationality, Rating
K. Mbappé, France, NaN
W. Ben Yedder, France, NaN
G. Donnarumma, Italy, NaN
CodePudding user response:
You would need to normalize the keys in both DataFrames in order to merge them.
One idea would be to create a function to process the full name in player_nationalities
and merge on the processed value for player name. eg:
def convert_player_name(name):
try:
first_name, last_name = name.split(' ', maxsplit=1)
return f'{first_name[0]}. {last_name}'
except ValueError:
return name
player_nationalities['processed_name'] = [convert_player_name(name) for name in player_nationalities['Player']]
df_merged = player_nationalities.merge(player_ratings, left_on='processed_name', right_on='Player')
[out]
Player_x Nationality processed_name Player_y Rating
0 Kylian Mbappé France K. Mbappé K. Mbappé 93
1 Wissam Ben Yedder France W. Ben Yedder W. Ben Yedder 89
2 Gianluigi Donnarumma Italy G. Donnarumma G. Donnarumma 91