I have a dataframe column which contains numbers followed by letters. I want to extract a the numbers out of this sting using a regex pattern. Here is an example of the column:
a b
1 26.6 km/kg
2 19.67 kmpl
3 18.9 kmpl
I've tried this and other similar variations:
if isinstance(df['b'], str):
df['b'] = df['b'].str.extract('(\d )').astype(int)
I get the error "AttributeError: 'str' object has no attribute 'str.'
CodePudding user response:
We can do this in two steps. First, overwrite the b
column with just the extracted decimal number, still as a string. Then, convert that column to numeric.
df["b"] = df["b"].str.extract('(\d (?:\.\d )?)')
df["b"] = pd.to_numeric(df["b"])
CodePudding user response:
Replace all characters except the decimal point and digits. Then proceed and impose the dtype of your choice
df= df.assign(b=df['b'].str.replace('([^\.0-9])','',regex=True).str.strip().astype(float))
CodePudding user response:
The df['b']
object has no attribute str
. You should do this instead:
if isinstance(df['b'], str):
df['b'] = df['b'].extract('(\d )').astype(int)