Home > Mobile >  Extract First 6 or 7 digit number from a string. The numbers have commas as well
Extract First 6 or 7 digit number from a string. The numbers have commas as well

Time:01-07

I am looking to extract First 6 or 7 digit number. The Numbers have commas as well

**Salary**

60000083 annually  
172829 annually  
2,50,000 annually  
2,02,000 annually  
27,00,000 annually  

and I am looking for the following output (There are 2 Columns - 1) Salary 2) 6_or_7 Digit

**Salary**               **6_or_7 Digit**  

60000083 annually        6000008  
172829 annually          172829  
2,50,000 annually        250000
2,02,000 annually        202000  
27,00,000 annually       2700000  

I am trying

Test['6_or_7 Digit'] = Test['Salary'].apply(lambda x: re.findall('[0-9]{1,6}', x)[0] if re.findall('[0-9]{1,6}', x) else '0').str.zfill(6)

The above extracts 6 Digit only for the first 2 cases and does not work for numbers having commas (2,50,000 , 2,02,000, 27,00,000)

CodePudding user response:

How about this ?

As mentioned by @Zacchaeus, remove commas & use regex to extract the digits.

df['Salary'].str.replace(",", '').str.extract("(\d{6,7})")

         0
0  6000008
1   172829
2   250000
3   202000
4  2700000

CodePudding user response:

Remove commas and extract the initial digits. I guess you don't need to worry about counting numbers with this.

Test['6_or_7 Digit'] = Test['Salary'].replace(',','',regex=True).str.extract('(\d )')

Test dataframe

               Salary 6_or_7 Digit
0   60000083 annually     60000083
1     172829 annually       172829
2   2,50,000 annually       250000
3   2,02,000 annually       202000
4  27,00,000 annually      2700000
  • Related