How can I replace pd intervals with integers in python-CodePudding

How can I replace pd intervals with integers

import pandas as pd 
df = pd.DataFrame()
df['age'] = [43, 76, 27, 8, 57, 32, 12, 22]
age_band = [0,10,20,30,40,50,60,70,80,90]
df['age_bands']= pd.cut(df['age'], bins=age_band, ordered=True)

output:

    age age_bands
0   43  (40, 50]
1   76  (70, 80]
2   27  (20, 30]
3   8   (0, 10]
4   57  (50, 60]
5   32  (30, 40]
6   12  (10, 20]
7   22  (20, 30]

now I want to add another column to replace the bands with a single number (int). but I could not

for example this did not work :

df['age_code']= df['age_bands'].replace({'(40, 50]':4})

how can I get a column looks like this?

    age_bands   age_code
0   (40, 50]      4
1   (70, 80]      7
2   (20, 30]      2
3   (0, 10]       0
4   (50, 60]      5
5   (30, 40]      3
6   (10, 20]      1
7   (20, 30]      2

CodePudding user response：

Assuming you want to the first digit from every interval, then, you can use pd.apply to achieve what you want as follows:

df["age_code"] = df["age_bands"].apply(lambda band: str(band)[1])

However, note this may not be very efficient for a large dataframe,

To convert the column values to int datatype, you can use pd.to_numeric,

df["age_code"] = pd.to_numeric(df['age_code'])

CodePudding user response：

As the column contains pd.Interval objects, use its property left

df['age_code'] = df['age_bands'].apply(lambda interval: interval.left // 10)

CodePudding user response：

You can do that by simply adding a second

CodePudding user response：

You can create a dictionary of bins and map it to the age_bands column:

bins_sorted = sorted(pd.cut(df['age'], bins=age_band, ordered=True).unique())
bins_dict = {key: idx for idx, key in enumerate(bins_sorted)}
df['age_code'] = df.age_bands.map(bins_dict).astype(int)