Extracting number from string only when string is present in a dataframe-CodePudding

I am trying to get extract a chain of numbers that might proceed a list of characters within a data frame. If there are no characters nothing needs to be done to the cell. If there are characters then I want the chares to be the take out. I want the end result to be the same column but with no characters. see example.

Before:

ID	Price	Item Code
1	3.60	a/b 80986
2	4.30	45772
3	0.60	fF/6 9778
4	9.78	48989
5	3.44	\ 545
6	3.44	r. 509

Result:

ID	Price	Item Code
1	3.60	80986
2	4.30	45772
3	0.60	9778
4	9.78	48989
5	3.44	545
6	3.44	509

CodePudding user response：

Use Series.str.extract with the regex pattern r'(?:^|\s)(\d ):

(?:^|\s) matches the beginning of the string ('^') or ('|') any whitespace character ('\s') without capturing it ((?:...))
(\d ) captures one or more digit (greedy)

df['Item Code'] = df['Item Code'].str.extract(r'(?:^|\s)(\d )', expand=False)

Note that the values of 'Item Code' are still stings after the extraction. If you want to convert them to integers use Series.astype.

df['Item Code'] = df['Item Code']str.extract(r'(?:\s|^)(\d )', expand=False).astype(int)

Output

>>> df

   ID  Price Item Code
0   1   3.60     80986
1   2   4.30     45772
2   3   0.60      9778
3   4   9.78     48989
4   5   3.44       545
5   6   3.44       509

CodePudding user response：

I think using a regex is the solution:

import re

dt["Item code"] = list(map(lambda x:int(re.findall("\d ", x)[0]), dt["Item code"]))