I need to put 'F1: AGE' into categorical bins: (unknown, 17 and under, 18-25, 26-35, 36-45, 46-55, 56 )
I have replaced all non-numeric values with a missing values. So far my code is:
age_index = df['F1: AGE'].str.isnumeric()
age_index = age_index.fillna(False)
dfy.loc[~age_index, 'F1: AGE'] = np.nan
From here, I am not sure where to go. I am wanting to use pd.cut() but what I have tried has given me:
Error: Bin edges must be unique
Along with about a dozen other errors from overthinking, I am completely stuck. Any help on how to create these bins and labels and have them properly work would be great.
CodePudding user response:
For pd.cut
you do need to provide uniqe bin edges - usually this would be something like this:
binned = pd.cut(my_data, bins=[-np.inf, 17, 25, 35, 45, 55, np.inf])
optionally you can use labels
to name your bins however you'd like:
binned = pd.cut(
my_data,
bins=[-np.inf, 17, 25, 35, 45, 55, np.inf],
labels=["17 and under", "18-25", "26-35", "36-45", "46-55", "56 "],
)