Survey data Cleaning - Grouping Age range - Python pandas-CodePudding

I have this data set with the following values counts for the column Age:

>>> game['Age'].value_counts()

Between 18 -25     131
Between 26 - 30     21
Under 18            10
31 or more           7
Name: Age, dtype: int64

I´m trying to create a regrouping of values with 2 groups for this column 'Age' :

 - <=25 // (grouping Between 18 -25   Under 18 )
 - >=26 // (grouping Between 26 - 30   31 or more )

I have been trying to play groupby function but no good result yet. Can you please help?

CodePudding user response：

You can use np.select:

mapping = {
    'Between 18 -25': '<=25',
    'Under 18': '<=25',
    'Between 26 - 30': '>=26',
    '31 or more': '>=26',
}

df['Age'] = np.select([df['Age'] == k for k in mapping.keys()], mapping.values())

Or just use .loc:

df.loc[df['Age'] == 'Between 18 -25', 'Age'] = '<=25'
df.loc[df['Age'] == 'Under 18', 'Age'] = '<=25'
df.loc[df['Age'] == 'Between 26 - 30', 'Age'] = '>=26'
df.loc[df['Age'] == '31 or more', 'Age'] = '>=26'

Or isin:

df.loc[df['Age'].isin(['Between 18 -25', 'Under 18']), 'Age'] = '<=25'
df.loc[df['Age'].isin(['Between 26 - 30', '31 or more']), 'Age'] = '>=26'

CodePudding user response：

Try to explain the problem in a better way or share your code for better understanding.

This can be acheived by using np.where() method

import numpy as np

game["How old are you?"] = np.where(((game["How old are you?"]=="Between 18 -25") |
                                     (game["How old are you?"]=="Under 18")), "Under 25", "26 or more")


game["How old are you?"].value_counts()

You should see an output something like this

Under 25    ...
26 or more  ...