How to bin age data into categories - issues with setting highs and lows-CodePudding

I need to put 'F1: AGE' into categorical bins: (unknown, 17 and under, 18-25, 26-35, 36-45, 46-55, 56 )

I have replaced all non-numeric values with a missing values. So far my code is:

age_index = df['F1: AGE'].str.isnumeric()
age_index = age_index.fillna(False)
dfy.loc[~age_index, 'F1: AGE'] = np.nan

From here, I am not sure where to go. I am wanting to use pd.cut() but what I have tried has given me:

Error: Bin edges must be unique

Along with about a dozen other errors from overthinking, I am completely stuck. Any help on how to create these bins and labels and have them properly work would be great.

CodePudding user response：

For pd.cut you do need to provide uniqe bin edges - usually this would be something like this:

binned = pd.cut(my_data, bins=[-np.inf, 17, 25, 35, 45, 55, np.inf])

optionally you can use labels to name your bins however you'd like:

binned = pd.cut(
    my_data,
    bins=[-np.inf, 17, 25, 35, 45, 55, np.inf],
    labels=["17 and under", "18-25", "26-35", "36-45", "46-55", "56 "],
)