Home > database >  Group by Certain Age Group in Pandas
Group by Certain Age Group in Pandas

Time:12-08

I have a column of age values that I need to group by in columns.

For example in this dataframe I have:

age

and would like to get to:

enter image description here

I did this to try to filtering it out and get the data but its returning nothing.

data_df = df[df['Age'] <= 30]
data_df

and its not working correctly and I got an error.

ValueError: cannot reindex from a duplicate axis

CodePudding user response:

First convert values of column to numeric by remove , then binning by cut and last create indicators by get_dummies with append to original DataFrame:

df['Age'] = df['Age'].astype(str).str.strip(' ').astype(int)

df  = df.join(pd.get_dummies(pd.cut(df['Age'],
                           bins=(0,18,25,29,50,np.inf), 
                           labels=['Under 18','19_to_25','26_to_29','30_to_50','Over 50'])))
print (df)
    Age  Under 18  19_to_25  26_to_29  30_to_50  Over 50
0    12         1         0         0         0        0
1    13         1         0         0         0        0
2    14         1         0         0         0        0
3    18         1         0         0         0        0
4    20         0         1         0         0        0
5    25         0         1         0         0        0
6    30         0         0         0         1        0
7    40         0         0         0         1        0
8    50         0         0         0         1        0
9    60         0         0         0         0        1
10   70         0         0         0         0        1
  • Related