How to Extract Numbers from String Column in Pandas with decimal?-CodePudding

I need to extract Numbers from String Column.

df:

Product
 tld los 16OZ
 HSJ14 OZ
 hqk 28.3 OZ
 rtk .7 OZ
 ahdd .92OZ
 aje 0.22 OZ

I need to Extract Numbers from column "Product" along with Decimal.

df_Output:

 Product         Numbers
 tld los 16OZ    16
 HSJ14 OZ        14
 hqk 28.3 OZ     28.3
 rtk .7 OZ       0.7
 ahdd .92OZ      0.92
 aje 0.22 OZ     0.22

what i tried:

df['Numbers'] =  df['Product'].str.extract('([0-9] [,./]*[0-9]*)') -- Missing Values like .7

CodePudding user response：

If as simplified as presented, replace every other string except digit and dot

df['Numbers'] =df['Product'].str.replace('[^\d\.]','', regex=True).astype(float)



       Product  Numbers
0  tld los 16OZ    16.00
1      HSJ14 OZ    14.00
2   hqk 28.3 OZ    28.30
3     rtk .7 OZ     0.70
4    ahdd .92OZ     0.92
5   aje 0.22 OZ     0.22

CodePudding user response：

You can use this regex:

df['Numbers'] = df['Product'].str.extract(r'(\d*\.\d |\d )', expand=False)

        Product Numbers
0  tld los 16OZ      16
1      HSJ14 OZ      14
2   hqk 28.3 OZ    28.3
3     rtk .7 OZ      .7
4    ahdd .92OZ     .92
5   aje 0.22 OZ    0.22

CodePudding user response：

If you want to match the numbers followed by OZ You could write the pattern as:

(\d*\.?\d )\s*OZ\b

Explanation

( Capture group 1 (the value will be picked up be str.extract)
\d*\.?\d Match optional digits, optional dot and 1 digits
) Close group 1
\s*OZ\b Match optional whitspace chars and then OZ followed by a word boundary

See a regex demo.

import pandas as pd

data= [
    "tld los 16OZ",
    "HSJ14 OZ",
    "hqk 28.3 OZ",
    "rtk .7 OZ",
    "ahdd .92OZ",
    "aje 0.22 OZ"
]

df = pd.DataFrame(data, columns=["Product"])
df['Numbers'] =  df['Product'].str.extract(r'(\d*\.?\d )\s*OZ\b')
print(df)

Output

        Product Numbers
0  tld los 16OZ      16
1      HSJ14 OZ      14
2   hqk 28.3 OZ    28.3
3     rtk .7 OZ      .7
4    ahdd .92OZ     .92
5   aje 0.22 OZ    0.22