How to get the first 9 characters starting from number 5?-CodePudding

I have data that looks like this

0            504189219
1            500618053
2            0537533477
3            966581566618
4            00536079946

I want the output to be something like this

CodePudding user response：

Use str.extract:

df['Col'] = df['Col'].str.extract('(5\d{8})')
print(df)

# Output
         Col
0  504189219
1  500618053
2  537533477
3  581566618
4  536079946

Setup:

df = pd.DataFrame({'Col': ['504189219', '500618053', '0537533477',
                           '966581566618', '00536079946']})
print(df)

# Output
            Col
0     504189219
1     500618053
2    0537533477
3  966581566618
4   00536079946

CodePudding user response：

There is a library called phonenumbers to help you do that job, see this post

CodePudding user response：

Using the same setup as Corralien, this method is also possible :

df = pd.DataFrame({'Col': ['504189219', '500618053', '0537533477',
                           '966581566618', '00536079946']})

def getNumber(n):
    return n[n.find('5'):n.find('5')   9]

df['Col'] = df['Col'].apply(getNumber)

print(df)

Same result can be achieved with a lambda expression as well.

Other answers originally did not take into account the constraint of the 9 numbers.

CodePudding user response：

This may be a more robust approach:

import pandas as pd

def fix(col):
    return col[-9:] if len(col) > 8 and col[-9] == '5' else col


df = pd.DataFrame({'Col': ['0404189219', '500618053', '0537533477',
                           '966581566618', '00536079946']})

df['Col'] = df['Col'].apply(fix)
print(df)

Output:

         Col
0  0404189219
1   500618053
2   537533477
3   581566618
4   536079946

Note how in the absence of '5', the original value remains intact

CodePudding user response：

for r in range(len(df.Col)): df.Col[r][df.Col[r].find("5"):]