Home > Net >  How to combine categorical data into different group in pandas
How to combine categorical data into different group in pandas

Time:10-26

I have a dataframe like this:

        TermReason       Termcount
1     Another position      20
2        unhappy            14
3       more money          11
4     career change         9

I want to combine Another position & career change into Better oppotinutity and also sum up their numerical data, so does the other tow columns.

I would like to know how to make dataframe like this:

        Termgroup              Termcount
1   Better oppotinutity           29
2   Unsatisfied with job          25

Thanks.

CodePudding user response:

First is necessary defined groups, here I use dictionary of lists, then mapping column TermReason by Series.map with flatten dictionary d1 and last aggregate sum:

d = {'Better oppotinutity':['Another position','career change'],
     'Unsatisfied with job':['unhappy','more money']}

d1 = {x: k for k, v in d.items() for x in v}
df['Termgroup'] = df['TermReason'].map(d1)
print (df)
         TermReason  Termcount             Termgroup
1  Another position         20   Better oppotinutity
2           unhappy         14  Unsatisfied with job
3        more money         11  Unsatisfied with job
4     career change          9   Better oppotinutity

df = df.groupby('Termgroup', as_index=False)['Termcount'].sum()
print (df)
              Termgroup  Termcount
0   Better oppotinutity         29
1  Unsatisfied with job         25

CodePudding user response:

Considering that the original dataframe is df, the following will do the work

import pandas as pd
import numpy as np

df_new = df.groupby('TermReason').sum().reset_index()
df_new['Termgroup'] = np.where(df_new['TermReason'].isin(['Another position', 'career change']), 'Better oppotinutity', 'Unsatisfied with job')
df_new = df_new.groupby('Termgroup').sum().reset_index()

[Out]:

              Termgroup  Termcount
0   Better oppotinutity         29
1  Unsatisfied with job         25

Let's break down each step.

Start by grouping by TermReason with pandas.DataFrame.groupby, then use .sum() and pandas.DataFrame.reset_index as follows

df_new = df.groupby('TermReason').sum().reset_index()

[Out]:

         TermReason  Termcount
0  Another position         20
1     career change          9
2        more money         11
3           unhappy         14

Let us create a new column to store Better oppotinutity when one has Another position or career change, else use Unsatisfied with job. For that, one will use numpy.where and pandas.Series.isin as follows

df_new['Termgroup'] = np.where(df_new['TermReason'].isin(['Another position', 'career change']), 'Better oppotinutity', 'Unsatisfied with job')

[Out]:

         TermReason  Termcount             Termgroup
0  Another position         20   Better oppotinutity
1     career change          9   Better oppotinutity
2        more money         11  Unsatisfied with job
3           unhappy         14  Unsatisfied with job

Finally, with pandas.DataFrame.groupby, .sum() and pandas.DataFrame.reset_index one will be able to get the desired output

df_new = df_new.groupby('Termgroup').sum().reset_index()

[Out]:

              Termgroup  Termcount
0   Better oppotinutity         29
1  Unsatisfied with job         25
  • Related