I have data that looks like this
0 504189219
1 500618053
2 0537533477
3 966581566618
4 00536079946
I want the output to be something like this
504189219
500618053
537533477
581566618
536079946
CodePudding user response:
Use str.extract
:
df['Col'] = df['Col'].str.extract('(5\d{8})')
print(df)
# Output
Col
0 504189219
1 500618053
2 537533477
3 581566618
4 536079946
Setup:
df = pd.DataFrame({'Col': ['504189219', '500618053', '0537533477',
'966581566618', '00536079946']})
print(df)
# Output
Col
0 504189219
1 500618053
2 0537533477
3 966581566618
4 00536079946
CodePudding user response:
There is a library called phonenumbers
to help you do that job, see this post
CodePudding user response:
Using the same setup as Corralien, this method is also possible :
df = pd.DataFrame({'Col': ['504189219', '500618053', '0537533477',
'966581566618', '00536079946']})
def getNumber(n):
return n[n.find('5'):n.find('5') 9]
df['Col'] = df['Col'].apply(getNumber)
print(df)
Same result can be achieved with a lambda expression as well.
Other answers originally did not take into account the constraint of the 9 numbers.
CodePudding user response:
This may be a more robust approach:
import pandas as pd
def fix(col):
return col[-9:] if len(col) > 8 and col[-9] == '5' else col
df = pd.DataFrame({'Col': ['0404189219', '500618053', '0537533477',
'966581566618', '00536079946']})
df['Col'] = df['Col'].apply(fix)
print(df)
Output:
Col
0 0404189219
1 500618053
2 537533477
3 581566618
4 536079946
Note how in the absence of '5', the original value remains intact
CodePudding user response:
for r in range(len(df.Col)): df.Col[r][df.Col[r].find("5"):]