I am working with fifa dataset and there I have 2 columns player name and playing positions. The playing position column contains multiple values as players can play in multiple positions. It is a fifa dataset from kaggle
df = pd.read_csv('dataset.csv')
df = df['name', 'pos']
df.head()
index | name | pos |
---|---|---|
0 | messi | ST, RW, LW |
1 | lewa | ST |
2 | ronaldo | LW, ST |
I want to convert that and map those to 4 separate positions for instance to Att, mid, def, gk . I tried using replace and give it a dictionary with mapping but seems to only work with players that have only 1 position. For instance messi has RW, LW, ST so my mapping doesnot work but someone like Lewandowski who has only ST it works.
Then I tried splitting the column and exploding it into different rows, and then replace , which made three 0 indexed rows with messi but that replaces only on first instance of index as well. How do i solve this ?
df1 = df[['name', 'pos']]
df1['pos'] = df['pos'].str.split(",")
df1 = df1.explode('pos')
new_pos = {'RW': 'ATT', 'LW': 'ATT', 'ST': 'ATT', 'CF': 'ATT', 'CAM': 'MID', 'CDM': 'MID', 'GK': 'GK', 'CM': 'MID' }
df1.replace({'pos': new_pos}, inplace=True)
df1.head()
below is how the df looks, there is no index column i have just added here to show to more clearly.
index | name | pos |
---|---|---|
0 | messi | ATT |
0 | messi | ST |
0 | messi | LW |
Expected output:
index | name | pos |
---|---|---|
0 | messi | ATT |
1 | lewa | ATT |
2 | ronaldo | ATT |
CodePudding user response:
If you only want the first position, the best might be to use a list comprehension:
df['new_pos'] = [new_pos[next(iter(x.split(',')))] for x in df['pos']]
output:
index name pos new_pos
0 0 messi ST, RW, LW ATT
1 1 lewa ST ATT
2 2 ronaldo LW, ST ATT
The (likely much less efficient) pandas equivalent would be:
df['new_pos'] = (df['pos']
.str.split(',')
.explode()
.map(new_pos)
.groupby(level=0).first()
)