I have to extract for each string in over 1000 of rows, the unit of measure and the associated number, like:
25 kg, 1000L
The string looks like this:
ZALTATA MA177-445 IBCC 1000L
The number is before the measurement unit.
CodePudding user response:
Assuming this example input:
col
0 ZALTATA MA177-445 IBCC 1000L
1 ZALTATA MA177 445 kg IBCC 1000
you could use extract
:
import re
df['col'].str.extract(r'\b(?P<value>\d )\s*(?P<unit>kg|l)\b', flags=re.I)
output:
value unit
0 1000 L
1 445 kg
join and save
(df.join(df['col'].str.extract(r'\b(?P<value>\d )\s*(?P<unit>kg|l)\b', flags=re.I))
.to_excel('out.xslx')
)