I have a dataframe like this:
TermReason Termcount
1 Another position 20
2 unhappy 14
3 more money 11
4 career change 9
I want to combine Another position
& career change
into Better oppotinutity
and also sum up their numerical data, so does the other tow columns.
I would like to know how to make dataframe like this:
Termgroup Termcount
1 Better oppotinutity 29
2 Unsatisfied with job 25
Thanks.
CodePudding user response:
First is necessary defined groups, here I use dictionary of lists, then mapping column TermReason
by Series.map
with flatten dictionary d1
and last aggregate sum
:
d = {'Better oppotinutity':['Another position','career change'],
'Unsatisfied with job':['unhappy','more money']}
d1 = {x: k for k, v in d.items() for x in v}
df['Termgroup'] = df['TermReason'].map(d1)
print (df)
TermReason Termcount Termgroup
1 Another position 20 Better oppotinutity
2 unhappy 14 Unsatisfied with job
3 more money 11 Unsatisfied with job
4 career change 9 Better oppotinutity
df = df.groupby('Termgroup', as_index=False)['Termcount'].sum()
print (df)
Termgroup Termcount
0 Better oppotinutity 29
1 Unsatisfied with job 25
CodePudding user response:
Considering that the original dataframe is df
, the following will do the work
import pandas as pd
import numpy as np
df_new = df.groupby('TermReason').sum().reset_index()
df_new['Termgroup'] = np.where(df_new['TermReason'].isin(['Another position', 'career change']), 'Better oppotinutity', 'Unsatisfied with job')
df_new = df_new.groupby('Termgroup').sum().reset_index()
[Out]:
Termgroup Termcount
0 Better oppotinutity 29
1 Unsatisfied with job 25
Let's break down each step.
Start by grouping by TermReason
with pandas.DataFrame.groupby
, then use .sum()
and pandas.DataFrame.reset_index
as follows
df_new = df.groupby('TermReason').sum().reset_index()
[Out]:
TermReason Termcount
0 Another position 20
1 career change 9
2 more money 11
3 unhappy 14
Let us create a new column to store Better oppotinutity
when one has Another position
or career change
, else use Unsatisfied with job
. For that, one will use numpy.where
and pandas.Series.isin
as follows
df_new['Termgroup'] = np.where(df_new['TermReason'].isin(['Another position', 'career change']), 'Better oppotinutity', 'Unsatisfied with job')
[Out]:
TermReason Termcount Termgroup
0 Another position 20 Better oppotinutity
1 career change 9 Better oppotinutity
2 more money 11 Unsatisfied with job
3 unhappy 14 Unsatisfied with job
Finally, with pandas.DataFrame.groupby
, .sum()
and pandas.DataFrame.reset_index
one will be able to get the desired output
df_new = df_new.groupby('Termgroup').sum().reset_index()
[Out]:
Termgroup Termcount
0 Better oppotinutity 29
1 Unsatisfied with job 25