I want to replace every value in the age column with its middle value-CodePudding

I have a column that looks like this:

Age
[0-10)
[10-20)
[20-30)
[30-40)
[40-50)
[50-60)
[60-70)
[70-80)

and want to remove the "[","-" and ")". Instead of showing the range such as 0-10, I would like to show the middle value instead for every row in the column

CodePudding user response：

Yet another solution:

The dataframe:

df = pd.DataFrame({'Age':['[0-10)','[10-20)','[20-30)','[30-40)','[40-50)','[50-60)','[60-70)','[70-80)']})
df
       Age
0    [0-10)
1   [10-20)
2   [20-30)
3   [30-40)
4   [40-50)
5   [50-60)
6   [60-70)
7   [70-80)

The code:

df['Age'] = df.Age.str.extract('(\d )-(\d )').astype('int').mean(axis=1).astype('int')

The result:

CodePudding user response：

If you want to explode a row into multiple rows where each row carries a value from the range, you can do this:

data = '''[0-10)
[10-20)
[20-30)
[30-40)
[40-50)
[50-60)
[60-70)
[70-80)'''

df = pd.DataFrame({'Age': data.splitlines()})

df['Age'] = df['Age'].str.extract(r'\[(\d )-(\d )\)').astype(int).apply(lambda r: list(range(r[0], r[1])), axis=1)

df.explode('Age')

Note that I assume your Age column is string typed, so I used extract to get the boundaries of the ranges, and convert them to a real list of integers. Finally explode your dataframe for the modified Age column will get you a new row for each integer in the list. Values in other columns will be copied accordingly.

CodePudding user response：

I tried this:

import pandas as pd
import re

data = {
    'age_range': [
        '[0-10)',
        '[10-20)',
        '[20-30)',
        '[30-40)',
        '[40-50)',
        '[50-60)',
        '[60-70)',
        '[70-80)',
    ]
}
df = pd.DataFrame(data=data)

def get_middle_age(age_range):
    pattern = r'(\d )'
    ages = re.findall(pattern, age_range)

    return int((int(ages[0]) int(ages[1]))/2)

df['age'] = df.apply(lambda row: get_middle_age(row['age_range']), axis=1)