Home > database >  (Python) Transform dataframe: parse rows if more than one value is given and add to corresponding gi
(Python) Transform dataframe: parse rows if more than one value is given and add to corresponding gi

Time:05-25

Sample dataset I have looks like this:

Language Count
Russian 1000
English 1500
Spanish 500
Arabic,Russian, English, Spanish 2
Arabic, English 15

I want it to transform so that the result looks like this:

Language Count
Russian 1002
English 1517
Spanish 502
Arabic 17

So what happened is that, I parsed rows that contained more than one language. Added up them to languages that were already given. If it was not given (in this case: Arabic) created the new one.

How can i achieve this? Thank you!

CodePudding user response:

Use DataFrame.assign with Series.str.split, DataFrame.explode and last aggregate sum:

df = (df.assign(Language=df.Language.str.split(','))
        .explode('Language')
        .groupby('Language', as_index=False, sort=False)
        .sum())
print (df)
  Language  Count
0  Russian   1002
1  English   1517
2  Spanish    502
3   Arabic     17

CodePudding user response:

You can split, explode, and groupby.sum:

out = (df
   .assign(Language=df['Language'].str.split(',\s*'))
   .explode('Language')
   .groupby('Language', as_index=False, sort=False).sum()
)

output:

  Language  Count
0  Russian   1002
1  English   1517
2  Spanish    502
3   Arabic     17
  • Related