Home > Blockchain >  Creating a new column in dataframe with range of values
Creating a new column in dataframe with range of values

Time:04-15

I need to divide range of my passengers age onto 5 parts and create a new column where will be values from 0 to 4 respectively for every part(For 1 range value 0 for 2 range value 1 etc)

a = range(0,17)
b = range(17,34)
c = range(34, 51)
d = range(51, 68)
e = range(68,81)
a1 = titset.query('Age >= 0 & Age < 17')
a2 = titset.query('Age >= 17 & Age < 34')
a3 = titset.query('Age >= 34 & Age < 51')
a4 = titset.query('Age >= 51 & Age < 68')
a5 = titset.query('Age >= 68 & Age < 81')
titset['Age_bin'] = a1.apply(0 for a in  range(a))

Here what i tried to do but it does not work. I also pin dataset picture

DATASET

I expect to get result where i'll see a new column named 'Age_bin' and values 0 in it for Age from 0 to 16 inclusively, values 1 for age from 17 to 33 and other 3 rangers

CodePudding user response:

Binning with pandas cut is appropriate here, try:

titset['Age_bin'] = titset['Age'].cut(bins=[0,17,34,51,68,81], include_lowest=True, labels=False)

CodePudding user response:

First of all, the variable a is a range object, which you are calling range(a) again, which is equivalent to range(range(0, 17)), hence the error.

Secondly, even if you fixed the above problem, you will run into an error again since .apply takes in a callable (i.e., a function be it defined with def or a lambda function).

If your goal is to assign a new column that represents the age group that each row is in, you can just filter with your result and assign them:

titset = pd.DataFrame({'Age': range(1, 81)})


a = range(0,17)
b = range(17,34)
c = range(34, 51)
d = range(51, 68)
e = range(68,81)

a1 = titset.query('Age >= 0 & Age < 17')
a2 = titset.query('Age >= 17 & Age < 34')
a3 = titset.query('Age >= 34 & Age < 51')
a4 = titset.query('Age >= 51 & Age < 68')
a5 = titset.query('Age >= 68 & Age < 81')

titset.loc[a1.index, 'Age_bin'] = 0
titset.loc[a2.index, 'Age_bin'] = 1
titset.loc[a3.index, 'Age_bin'] = 2
titset.loc[a4.index, 'Age_bin'] = 3
titset.loc[a5.index, 'Age_bin'] = 4

Or better yet, use a for loop:

age_groups = [0, 17, 34, 51, 68, 81]

for i in range(len(age_groups) - 1):
    subset = titset.query(f'Age >= {age_groups[i]} & Age < {age_groups[i 1]}')
    titset.loc[subset.index, 'Age_bin'] = i
  • Related