I have a df
import seaborn as sns
import numpy as np
import pandas as pd
df = sns.load_dataset('titanic')
I am trying to add 8 columns from 1-8
which is the age of the passenger and to count how many passengers fit in to each of those categories 1-8
I tried:
df_temp = df.groupby('sex').agg({'fare':np.mean})
Which returns:
fare
sex
female 44.48
male 25.52
Then I have age_buckets = range(1,9)
which are the age categories I am interested in.
The goal is to have this kind of structure:
sex fare age_1 age_2 age_3 ... age_8
male 25 0 10 3 x
female 44 1 2 2 y
Which would mean that there were 0
age_1 males and 1 girl that was 1
year old and so on.
So I tried the following:
df_age = df[df.age.isin(age_buckets)].groupby(['sex','age']) \
.count() \
.survived \
.reset_index()
Which returns the data I want but not in the format I can add to my primary df_temp
as a column per age.
So at this point I am not sure if this is the right way to achieve the desired result I am trying to create and don't know how to work forward.
CodePudding user response:
Use cut
with crosstab
and add DataFrame.add_prefix
:
df = (df_temp.join(pd.crosstab(df['sex'],pd.cut(df['age'],
bins=range(0,9),
labels=range(1,9)))
.add_prefix('age_')))
print (df)
fare age_1 age_2 age_3 age_4 age_5 age_6 age_7 age_8
sex
female 44.479818 4 6 2 5 4 2 1 2
male 25.523893 10 4 4 5 0 1 2 2