Home > Enterprise >  Extracting number from string only when string is present in a dataframe
Extracting number from string only when string is present in a dataframe

Time:11-16

I am trying to get extract a chain of numbers that might proceed a list of characters within a data frame. If there are no characters nothing needs to be done to the cell. If there are characters then I want the chares to be the take out. I want the end result to be the same column but with no characters. see example.

Before:

ID Price Item Code
1 3.60 a/b 80986
2 4.30 45772
3 0.60 fF/6 9778
4 9.78 48989
5 3.44 \ 545
6 3.44 r. 509

Result:

ID Price Item Code
1 3.60 80986
2 4.30 45772
3 0.60 9778
4 9.78 48989
5 3.44 545
6 3.44 509

CodePudding user response:

Use Series.str.extract with the regex pattern r'(?:^|\s)(\d ):

  • (?:^|\s) matches the beginning of the string ('^') or ('|') any whitespace character ('\s') without capturing it ((?:...))
  • (\d ) captures one or more digit (greedy)
df['Item Code'] = df['Item Code'].str.extract(r'(?:^|\s)(\d )', expand=False)

Note that the values of 'Item Code' are still stings after the extraction. If you want to convert them to integers use Series.astype.

df['Item Code'] = df['Item Code']str.extract(r'(?:\s|^)(\d )', expand=False).astype(int)

Output

>>> df

   ID  Price Item Code
0   1   3.60     80986
1   2   4.30     45772
2   3   0.60      9778
3   4   9.78     48989
4   5   3.44       545
5   6   3.44       509

CodePudding user response:

I think using a regex is the solution:

import re

dt["Item code"] = list(map(lambda x:int(re.findall("\d ", x)[0]), dt["Item code"]))
  • Related