I have a dataframe, and there are columns (Height, Width and Life Span) with minimal and maximal values. What I want is to create another column with average values (27 for Height etc.), but to do it I need to extract two values - 26 and 28. And I didn't manage to succeed in this task - for example, with regex (df['Height'].replace('\D', '', regex=True)
) I could only extract all numbers together, which isn't right. Or I could use re.findall(r'\d ', df['Height'][0])
which gives me what I need, but it applies only for the first row and not for any others (there are 400 rows in original dataset, I pasted only the first one). So, if you could help me I'll be very glad.
Code for df below.
import pandas as pd
pd.DataFrame({'Dog Breed Group': 'Working Dogs',
'Height': '26 to 28 inches at the shoulder',
'Weight': '80 to 140 pounds',
'Life Span': '10 to 11 years'}, index=[0])
CodePudding user response:
Extract digits from strings then convert them as int and finally apply the mean to get a unique number:
COLS = ['Height', 'Weight', 'Life Span']
get_mean = lambda x: x.str.extractall('(\d )').astype(int).mean()
df[COLS] = df[COLS].apply(get_mean)
print(df)
# Output:
Dog Breed Group Height Weight Life Span
0 Working Dogs 27.0 110.0 10.5
CodePudding user response:
Thanks to the answer, I managed to do this (although not elegant and too complicated):
df2 = df.copy()
e = df2.Height.str.extractall('(\d )').astype(int).reset_index().groupby('level_0').mean()
df2['mean_height'] = e.iloc[:, 1]
df2.head()
Same apply to other columns.