Home > Software design >  How to bin age data into categories - issues with setting highs and lows
How to bin age data into categories - issues with setting highs and lows

Time:09-11

I need to put 'F1: AGE' into categorical bins: (unknown, 17 and under, 18-25, 26-35, 36-45, 46-55, 56 )

I have replaced all non-numeric values with a missing values. So far my code is:

age_index = df['F1: AGE'].str.isnumeric()
age_index = age_index.fillna(False)
dfy.loc[~age_index, 'F1: AGE'] = np.nan

From here, I am not sure where to go. I am wanting to use pd.cut() but what I have tried has given me:

Error: Bin edges must be unique

Along with about a dozen other errors from overthinking, I am completely stuck. Any help on how to create these bins and labels and have them properly work would be great.

CodePudding user response:

For pd.cut you do need to provide uniqe bin edges - usually this would be something like this:

binned = pd.cut(my_data, bins=[-np.inf, 17, 25, 35, 45, 55, np.inf])

optionally you can use labels to name your bins however you'd like:

binned = pd.cut(
    my_data,
    bins=[-np.inf, 17, 25, 35, 45, 55, np.inf],
    labels=["17 and under", "18-25", "26-35", "36-45", "46-55", "56 "],
)
  • Related