How do I separate measurement value and unit into their respective columns if they appear together i-CodePudding

I have a DataFrame which contains measurements; e.g., weight, height, etc. However, sometimes the measurement column contains units together with values. Therefore, I would like to separate measurement values and units if they are together in DataFrame. Eg., In the below DataFrame, df, the height value and unit of the first entry are in respective columns. However, the value column of the 2nd and 3rd entries of height contains both value and unit together. In that case, I would like to move height units "m" and "cm" from the value column to the unit column.

measurement name	value	unit
height	160.0	cm
height	1.5 m
height	155cm

The output DataFrame should look like the below -

measurement name	value	unit
height	160.0	cm
height	1.5	m
height	155.0	cm

May I know how I separate values and units into their respective column in DataFrame efficiency in Python?

CodePudding user response：

Use Series.str.extract with regex for get numeric values with . from start of string by ^, optionaly space separator by \s* and non numeric values in end of strings by \D with $ and pass to DataFrame.update for replace only extracted values:

df.update(df['value'].str.extract(r'^(?P<value>\d \.*\d*)\s*(?P<unit>\D )$'))
print (df)
  measurement name  value unit
0           height  160.0   cm
1           height    1.5    m
2           height    155   cm

CodePudding user response：

updating other answer to include the unit:

df['unit'] = df['value'].apply(lambda x: ''.join(l for l in x if l.isalpha()))
df.update(df['value'].str.extract(r'^(?P<value>\d \.*\d*)\s*(?P<unit>\D )$'))

print(df)

will output:

     name  value unit
0  height  160.0     
1  height    1.5    m
2  height    155   cm