Home > database >  Extract only numbers from string with python
Extract only numbers from string with python

Time:06-24

I am trying to extract numbers only from a pandas column

I used .str.extract('([?:\s\d ] )')

and it seems it worked well, but when I checked the data, there is a row that it is not matching the condition.

Row contains: 86531 86530 86529PIP 91897PIP

Result: 86531 86530 86529

As you can see last value 91897PIP has not been processed. I do not understand why.

What I need is that all the instances where there are numbers are extracted no matter if it has letter, spaces or special char at the beginning or end.

CodePudding user response:

Your regex doesn't do what you think it does. What you have is a character class, which matches any of the characters in the set ?: \t\r\n\f\v0-9 . So when the regex encounters the first non-matching character (P for your sample data) it stops. It's probably simpler to use replace to get rid of non-whitespace and digit characters:

df = pd.DataFrame({'data':['86531 86530 86529PIP 91897PIP']})
df['data'].str.replace('([^\s\d])', '', regex=True)

Which for your data will give:

86531 86530 86529 91897

CodePudding user response:

You can pass in a regular expression to .extract() that matches one or more digits:

df = pd.DataFrame({"data": ["86531", "86530", "86529PIP", "91897PIP"]})
df["data"] = df["data"].str.extract(r"(\d )")
print(df)

This outputs:

    data
0  86531
1  86530
2  86529
3  91897
  • Related