I have a column that looks like this:
Age
[0-10)
[10-20)
[20-30)
[30-40)
[40-50)
[50-60)
[60-70)
[70-80)
and want to remove the "[","-" and ")". Instead of showing the range such as 0-10, I would like to show the middle value instead for every row in the column
CodePudding user response:
Yet another solution:
The dataframe:
df = pd.DataFrame({'Age':['[0-10)','[10-20)','[20-30)','[30-40)','[40-50)','[50-60)','[60-70)','[70-80)']})
df
Age
0 [0-10)
1 [10-20)
2 [20-30)
3 [30-40)
4 [40-50)
5 [50-60)
6 [60-70)
7 [70-80)
The code:
df['Age'] = df.Age.str.extract('(\d )-(\d )').astype('int').mean(axis=1).astype('int')
The result:
df
Age
0 5
1 15
2 25
3 35
4 45
5 55
6 65
7 75
CodePudding user response:
If you want to explode a row into multiple rows where each row carries a value from the range, you can do this:
data = '''[0-10)
[10-20)
[20-30)
[30-40)
[40-50)
[50-60)
[60-70)
[70-80)'''
df = pd.DataFrame({'Age': data.splitlines()})
df['Age'] = df['Age'].str.extract(r'\[(\d )-(\d )\)').astype(int).apply(lambda r: list(range(r[0], r[1])), axis=1)
df.explode('Age')
Note that I assume your Age
column is string typed, so I used extract
to get the boundaries of the ranges, and convert them to a real list of integers. Finally explode your dataframe for the modified Age column will get you a new row for each integer in the list. Values in other columns will be copied accordingly.
CodePudding user response:
I tried this:
import pandas as pd
import re
data = {
'age_range': [
'[0-10)',
'[10-20)',
'[20-30)',
'[30-40)',
'[40-50)',
'[50-60)',
'[60-70)',
'[70-80)',
]
}
df = pd.DataFrame(data=data)
def get_middle_age(age_range):
pattern = r'(\d )'
ages = re.findall(pattern, age_range)
return int((int(ages[0]) int(ages[1]))/2)
df['age'] = df.apply(lambda row: get_middle_age(row['age_range']), axis=1)