Home > database >  how to write a function that calculates the age category
how to write a function that calculates the age category

Time:11-17

I need to write a function that calculates the age category, so this is the function :

def age_category(dob_years):
    if dob_years < 0 or pd.isna(dob_years):
        return 'NA'
    elif dob_years < 20:
        return '10-19'
    elif dob_years < 30:
        return '20-29'
    elif dob_years < 40:
        return '30-39'
    elif dob_years < 50:
        return '40-49'
    elif dob_years < 60:
        return '50-59'
    elif dob_years < 70:
        return '60-69'
    else:
        return '70 '

I checked the function it works but when I try to create a new column :

credit_scoring['age_group']= credit_scoring.apply(age_category, axis=1) 

I have this error :

TypeError: '<' not supported between instances of 'str' and 'int'

actually, i am new in python i don't know what to do pls help what is wrong with the code ? thanks for your time :)

CodePudding user response:

def age_category(dob_years):
    if not isinstance(dob_years, (float, int)):
        try:
            dob_years = int(dob_years)
        except ValueError:
            return 'NA'

    if dob_years < 0:
        return 'NA'

    return {
        0: '0-9',
        10: '10-19',
        20: '20-29',
        30: '30-39',
        40: '40-49',
        50: '50-59',
        60: '60-69',
        70: '70 ',
    }[10 * int(dob_years // 10)]

CodePudding user response:

You can achieve your goal more easily using pd.cut.

First of all, the sample data:

>>> df = pd.DataFrame([0, 18, -3, 73, 17, 88, 60,  1, 20, 14], columns=["age"])
>>> df
    age
0   0
1   18
2   -3
3   73
4   17
5   88
6   60
7   1
8   20
9   14

Then you need to prepare the bins and their labels:

>>> from math import inf
>>> bins = list(range(0, 80, 10))
>>> bins.append(inf)
>>> bins
[0, 10, 20, 30, 40, 50, 60, 70, inf]
>>> labels = [f"{i}-{i   9}" for i in bins[:-2]]
>>> labels.append(f"{bins[-2]} ")
>>> labels
['0-9', '10-19', '20-29', '30-39', '40-49', '50-59', '60-69', '70 ']

Once you have them, use pd.cut with right=True so it will assign labels according to your example.

>>> df["age group"] = pd.cut(df["age"], bins=bins, labels=labels, right=False)
>>> df
    age age group
0   0   0-9
1   18  10-19
2   -3  NaN
3   73  70 
4   17  10-19
5   88  70 
6   60  60-69
7   1   0-9
8   20  20-29
9   14  10-19
  • Related