pandas dataframe column contains string and int-CodePudding

My data-frame age column looks like this

20 or younger =14

61 or older =45

56-60 = 34

31-35 =30

56 or older =31

21-25 =23

26 30 =56

31 35 =44

36 40 =32

21 25 =26

26-30 =14

46 50 =14

36-40 =15

46-50 =33

41 45 =24

41-45 =29

51-55 =35

so i wrote this function to categorize it better but i got this typeerror message that says '<' not supported between instance of str and int

def age_buckets(x):

if x < 30: 
    return '18-29' 
elif x < 40: 
    return '30-39' 
elif x < 50: 
    return '40-49' 
elif x < 60: 
    return '50-59' 
elif x < 70: 
    return '60-69' 
elif x >=70: 
    return '70 ' 
else: return 'other'

Here is a link to what i am doing https://deepnote.com/workspace/eddie-abfa350f-f15e-43fe-8960-fab53a2def2e/project/Welcome-e6ac66b9-19f2-4973-bbc2-7adfda9366f3//Reasons for resignation analysis.ipynb

CodePudding user response：

You can't compare a string of characters with the < check. It doesn't associate that string with a number. That error says that the incoming x value is a string. Therefore, in order to do this, x must be a number. If it is in-fact an int, you can cast it with the int() function. Such as int(x) < 30...

What would be better is that you pass age_buckets an int rather than a string. So when you call it just do age_buckets(int(x)) rather than just age_buckets(x)

Please see : https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html

So when you do combined['age'] = combined['age'].apply(age_buckets(int(x))) you actually need to do combined['age'] = combined['age'].apply(age_buckets,1))

See if :

def age_buckets(y):
     x = int(y)
     if x < 30:
        ...

works