Home > Software design >  How could I extract two numbers from a df column?
How could I extract two numbers from a df column?

Time:02-12

I have a dataframe, and there are columns (Height, Width and Life Span) with minimal and maximal values. What I want is to create another column with average values (27 for Height etc.), but to do it I need to extract two values - 26 and 28. And I didn't manage to succeed in this task - for example, with regex (df['Height'].replace('\D', '', regex=True)) I could only extract all numbers together, which isn't right. Or I could use re.findall(r'\d ', df['Height'][0]) which gives me what I need, but it applies only for the first row and not for any others (there are 400 rows in original dataset, I pasted only the first one). So, if you could help me I'll be very glad.

Code for df below.

import pandas as pd

pd.DataFrame({'Dog Breed Group': 'Working Dogs',
 'Height': '26 to 28 inches at the shoulder',
 'Weight': '80 to 140 pounds',
 'Life Span': '10 to 11 years'}, index=[0])

CodePudding user response:

Extract digits from strings then convert them as int and finally apply the mean to get a unique number:

COLS = ['Height', 'Weight', 'Life Span']
get_mean = lambda x: x.str.extractall('(\d )').astype(int).mean()

df[COLS] = df[COLS].apply(get_mean)
print(df)

# Output:
  Dog Breed Group  Height  Weight  Life Span
0    Working Dogs    27.0   110.0       10.5

CodePudding user response:

Thanks to the answer, I managed to do this (although not elegant and too complicated):

df2 = df.copy()

e = df2.Height.str.extractall('(\d )').astype(int).reset_index().groupby('level_0').mean()

df2['mean_height'] = e.iloc[:, 1]

df2.head()

Same apply to other columns.

  • Related