I am looking to extract First 6 or 7 digit number. The Numbers have commas as well
**Salary**
60000083 annually
172829 annually
2,50,000 annually
2,02,000 annually
27,00,000 annually
and I am looking for the following output (There are 2 Columns - 1) Salary 2) 6_or_7 Digit
**Salary** **6_or_7 Digit**
60000083 annually 6000008
172829 annually 172829
2,50,000 annually 250000
2,02,000 annually 202000
27,00,000 annually 2700000
I am trying
Test['6_or_7 Digit'] = Test['Salary'].apply(lambda x: re.findall('[0-9]{1,6}', x)[0] if re.findall('[0-9]{1,6}', x) else '0').str.zfill(6)
The above extracts 6 Digit only for the first 2 cases and does not work for numbers having commas (2,50,000 , 2,02,000, 27,00,000)
CodePudding user response:
How about this ?
As mentioned by @Zacchaeus, remove commas & use regex to extract the digits.
df['Salary'].str.replace(",", '').str.extract("(\d{6,7})")
0
0 6000008
1 172829
2 250000
3 202000
4 2700000
CodePudding user response:
Remove commas and extract the initial digits. I guess you don't need to worry about counting numbers with this.
Test['6_or_7 Digit'] = Test['Salary'].replace(',','',regex=True).str.extract('(\d )')
Test dataframe
Salary 6_or_7 Digit
0 60000083 annually 60000083
1 172829 annually 172829
2 2,50,000 annually 250000
3 2,02,000 annually 202000
4 27,00,000 annually 2700000