I have a DataFrame with column "ages" and column "professional qualification", like this:
ages | professional qualification |
---|---|
45 | labourer |
49 | labourer |
29 | labourer |
61 | labourer |
45 | labourer |
37 | labourer |
17 | office worker |
56 | labourer |
47 | office worker |
I want to group the ages like this ( ,17), (17,29), (30,40), (40,50), (50, ) and, with these ages grouped I would to create a frequency table indicating on each age group what professional qualification appears more often.
Example:
ages | professional qualification |
---|---|
(,17) | office worker |
(17,29) | labourer |
(30,40) | labourer |
(40,50) | labourer |
etc, etc, etc. The people who have an age between 40 and 50 (excluding 40) are mostly labourers
All solutions will be appreciated.
CodePudding user response:
Use cut
with aggregate by GroupBy.agg
custom function by Series.mode
with select first element:
bins = [0,17,29,40,50,70,100]
f = lambda x: x.mode().iat[0]
df1 = (df.groupby(pd.cut(df['ages'], bins=bins))['professional qualification']
.agg(f)
.reset_index())
print (df1)
ages professional qualification
0 (0, 17] office worker
1 (17, 29] labourer
2 (29, 40] labourer
3 (40, 50] labourer
4 (50, 70] labourer
5 (70, 100] None