Home > Mobile >  convert pandas dataframe multi values column into separate rows
convert pandas dataframe multi values column into separate rows

Time:10-04

I am working with fifa dataset and there I have 2 columns player name and playing positions. The playing position column contains multiple values as players can play in multiple positions. It is a fifa dataset from kaggle

df = pd.read_csv('dataset.csv')
df = df['name', 'pos']
df.head()
index name pos
0 messi ST, RW, LW
1 lewa ST
2 ronaldo LW, ST

I want to convert that and map those to 4 separate positions for instance to Att, mid, def, gk . I tried using replace and give it a dictionary with mapping but seems to only work with players that have only 1 position. For instance messi has RW, LW, ST so my mapping doesnot work but someone like Lewandowski who has only ST it works.

Then I tried splitting the column and exploding it into different rows, and then replace , which made three 0 indexed rows with messi but that replaces only on first instance of index as well. How do i solve this ?

df1 = df[['name', 'pos']]
df1['pos'] = df['pos'].str.split(",")
df1 = df1.explode('pos')
new_pos = {'RW': 'ATT', 'LW': 'ATT', 'ST': 'ATT', 'CF': 'ATT', 'CAM': 'MID', 'CDM': 'MID', 'GK': 'GK', 'CM': 'MID' }
df1.replace({'pos': new_pos}, inplace=True)
df1.head()

below is how the df looks, there is no index column i have just added here to show to more clearly.

index name pos
0 messi ATT
0 messi ST
0 messi LW

Expected output:

index name pos
0 messi ATT
1 lewa ATT
2 ronaldo ATT

CodePudding user response:

If you only want the first position, the best might be to use a list comprehension:

df['new_pos'] = [new_pos[next(iter(x.split(',')))] for x in df['pos']]

output:

   index     name         pos new_pos
0      0    messi  ST, RW, LW     ATT
1      1     lewa          ST     ATT
2      2  ronaldo      LW, ST     ATT

The (likely much less efficient) pandas equivalent would be:

df['new_pos'] = (df['pos']
                 .str.split(',')
                 .explode()
                 .map(new_pos)
                 .groupby(level=0).first()
                )
  • Related