I need to extract Numbers from String Column.
df:
Product
tld los 16OZ
HSJ14 OZ
hqk 28.3 OZ
rtk .7 OZ
ahdd .92OZ
aje 0.22 OZ
I need to Extract Numbers from column "Product" along with Decimal.
df_Output:
Product Numbers
tld los 16OZ 16
HSJ14 OZ 14
hqk 28.3 OZ 28.3
rtk .7 OZ 0.7
ahdd .92OZ 0.92
aje 0.22 OZ 0.22
what i tried:
df['Numbers'] = df['Product'].str.extract('([0-9] [,./]*[0-9]*)') -- Missing Values like .7
CodePudding user response:
If as simplified as presented, replace every other string except digit and dot
df['Numbers'] =df['Product'].str.replace('[^\d\.]','', regex=True).astype(float)
Product Numbers
0 tld los 16OZ 16.00
1 HSJ14 OZ 14.00
2 hqk 28.3 OZ 28.30
3 rtk .7 OZ 0.70
4 ahdd .92OZ 0.92
5 aje 0.22 OZ 0.22
CodePudding user response:
You can use this regex:
df['Numbers'] = df['Product'].str.extract(r'(\d*\.\d |\d )', expand=False)
Product Numbers
0 tld los 16OZ 16
1 HSJ14 OZ 14
2 hqk 28.3 OZ 28.3
3 rtk .7 OZ .7
4 ahdd .92OZ .92
5 aje 0.22 OZ 0.22
CodePudding user response:
If you want to match the numbers followed by OZ
You could write the pattern as:
(\d*\.?\d )\s*OZ\b
Explanation
(
Capture group 1 (the value will be picked up be str.extract)\d*\.?\d
Match optional digits, optional dot and 1 digits)
Close group 1\s*OZ\b
Match optional whitspace chars and thenOZ
followed by a word boundary
See a regex demo.
import pandas as pd
data= [
"tld los 16OZ",
"HSJ14 OZ",
"hqk 28.3 OZ",
"rtk .7 OZ",
"ahdd .92OZ",
"aje 0.22 OZ"
]
df = pd.DataFrame(data, columns=["Product"])
df['Numbers'] = df['Product'].str.extract(r'(\d*\.?\d )\s*OZ\b')
print(df)
Output
Product Numbers
0 tld los 16OZ 16
1 HSJ14 OZ 14
2 hqk 28.3 OZ 28.3
3 rtk .7 OZ .7
4 ahdd .92OZ .92
5 aje 0.22 OZ 0.22