I need to write a function that calculates the age category, so this is the function :
def age_category(dob_years):
if dob_years < 0 or pd.isna(dob_years):
return 'NA'
elif dob_years < 20:
return '10-19'
elif dob_years < 30:
return '20-29'
elif dob_years < 40:
return '30-39'
elif dob_years < 50:
return '40-49'
elif dob_years < 60:
return '50-59'
elif dob_years < 70:
return '60-69'
else:
return '70 '
I checked the function it works but when I try to create a new column :
credit_scoring['age_group']= credit_scoring.apply(age_category, axis=1)
I have this error :
TypeError: '<' not supported between instances of 'str' and 'int'
actually, i am new in python i don't know what to do pls help what is wrong with the code ? thanks for your time :)
CodePudding user response:
def age_category(dob_years):
if not isinstance(dob_years, (float, int)):
try:
dob_years = int(dob_years)
except ValueError:
return 'NA'
if dob_years < 0:
return 'NA'
return {
0: '0-9',
10: '10-19',
20: '20-29',
30: '30-39',
40: '40-49',
50: '50-59',
60: '60-69',
70: '70 ',
}[10 * int(dob_years // 10)]
CodePudding user response:
You can achieve your goal more easily using pd.cut
.
First of all, the sample data:
>>> df = pd.DataFrame([0, 18, -3, 73, 17, 88, 60, 1, 20, 14], columns=["age"])
>>> df
age
0 0
1 18
2 -3
3 73
4 17
5 88
6 60
7 1
8 20
9 14
Then you need to prepare the bins and their labels:
>>> from math import inf
>>> bins = list(range(0, 80, 10))
>>> bins.append(inf)
>>> bins
[0, 10, 20, 30, 40, 50, 60, 70, inf]
>>> labels = [f"{i}-{i 9}" for i in bins[:-2]]
>>> labels.append(f"{bins[-2]} ")
>>> labels
['0-9', '10-19', '20-29', '30-39', '40-49', '50-59', '60-69', '70 ']
Once you have them, use pd.cut
with right=True
so it will assign labels according to your example.
>>> df["age group"] = pd.cut(df["age"], bins=bins, labels=labels, right=False)
>>> df
age age group
0 0 0-9
1 18 10-19
2 -3 NaN
3 73 70
4 17 10-19
5 88 70
6 60 60-69
7 1 0-9
8 20 20-29
9 14 10-19