My data-frame age column looks like this
20 or younger =14
61 or older =45
56-60 = 34
31-35 =30
56 or older =31
21-25 =23
26 30 =56
31 35 =44
36 40 =32
21 25 =26
26-30 =14
46 50 =14
36-40 =15
46-50 =33
41 45 =24
41-45 =29
51-55 =35
so i wrote this function to categorize it better but i got this typeerror message that says '<' not supported between instance of str and int
def age_buckets(x):
if x < 30:
return '18-29'
elif x < 40:
return '30-39'
elif x < 50:
return '40-49'
elif x < 60:
return '50-59'
elif x < 70:
return '60-69'
elif x >=70:
return '70 '
else: return 'other'
Here is a link to what i am doing https://deepnote.com/workspace/eddie-abfa350f-f15e-43fe-8960-fab53a2def2e/project/Welcome-e6ac66b9-19f2-4973-bbc2-7adfda9366f3//Reasons for resignation analysis.ipynb
CodePudding user response:
You can't compare a string of characters with the <
check. It doesn't associate that string with a number. That error says that the incoming x
value is a string
. Therefore, in order to do this, x
must be a number. If it is in-fact an int
, you can cast it with the int()
function.
Such as int(x) < 30
...
What would be better is that you pass age_buckets
an int
rather than a string
. So when you call it just do age_buckets(int(x))
rather than just age_buckets(x)
Please see : https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html
So when you do combined['age'] = combined['age'].apply(age_buckets(int(x)))
you actually need to do
combined['age'] = combined['age'].apply(age_buckets,1))
See if :
def age_buckets(y):
x = int(y)
if x < 30:
...
works