I am trying to get extract a chain of numbers that might proceed a list of characters within a data frame. If there are no characters nothing needs to be done to the cell. If there are characters then I want the chares to be the take out. I want the end result to be the same column but with no characters. see example.
Before:
ID | Price | Item Code |
---|---|---|
1 | 3.60 | a/b 80986 |
2 | 4.30 | 45772 |
3 | 0.60 | fF/6 9778 |
4 | 9.78 | 48989 |
5 | 3.44 | \ 545 |
6 | 3.44 | r. 509 |
Result:
ID | Price | Item Code |
---|---|---|
1 | 3.60 | 80986 |
2 | 4.30 | 45772 |
3 | 0.60 | 9778 |
4 | 9.78 | 48989 |
5 | 3.44 | 545 |
6 | 3.44 | 509 |
CodePudding user response:
Use Series.str.extract
with the regex pattern r'(?:^|\s)(\d )
:
(?:^|\s)
matches the beginning of the string ('^'
) or ('|'
) any whitespace character ('\s'
) without capturing it ((?:...)
)(\d )
captures one or more digit (greedy)
df['Item Code'] = df['Item Code'].str.extract(r'(?:^|\s)(\d )', expand=False)
Note that the values of 'Item Code' are still stings after the extraction. If you want to convert them to integers use Series.astype
.
df['Item Code'] = df['Item Code']str.extract(r'(?:\s|^)(\d )', expand=False).astype(int)
Output
>>> df
ID Price Item Code
0 1 3.60 80986
1 2 4.30 45772
2 3 0.60 9778
3 4 9.78 48989
4 5 3.44 545
5 6 3.44 509
CodePudding user response:
I think using a regex is the solution:
import re
dt["Item code"] = list(map(lambda x:int(re.findall("\d ", x)[0]), dt["Item code"]))