I have a column of age values that I need to group by in columns.
For example in this dataframe I have:
and would like to get to:
I did this to try to filtering it out and get the data but its returning nothing.
data_df = df[df['Age'] <= 30]
data_df
and its not working correctly and I got an error.
ValueError: cannot reindex from a duplicate axis
CodePudding user response:
First convert values of column to numeric by remove
, then binning by cut
and last create indicators by get_dummies
with append to original DataFrame
:
df['Age'] = df['Age'].astype(str).str.strip(' ').astype(int)
df = df.join(pd.get_dummies(pd.cut(df['Age'],
bins=(0,18,25,29,50,np.inf),
labels=['Under 18','19_to_25','26_to_29','30_to_50','Over 50'])))
print (df)
Age Under 18 19_to_25 26_to_29 30_to_50 Over 50
0 12 1 0 0 0 0
1 13 1 0 0 0 0
2 14 1 0 0 0 0
3 18 1 0 0 0 0
4 20 0 1 0 0 0
5 25 0 1 0 0 0
6 30 0 0 0 1 0
7 40 0 0 0 1 0
8 50 0 0 0 1 0
9 60 0 0 0 0 1
10 70 0 0 0 0 1