Formatting Phone number with 1 with pandas.Series.replace-CodePudding

I can't find a solution online and I know this should be easy but I can't figure out what is wrong with my regex:

here is my code:

df = pd.DataFrame({'Company phone number': [' 1-541-296-2271', ' 1-542-296-2271', ' 1-543-296-2271'],
                   'Contact phone number': ['15112962271', None,'15312962271'],
                   'num_specimen_seen': [10, 2,3]},
                  index=['falcon', 'dog','cat'])

df['Contact phone number'] = df['Contact phone number'].str.replace('^\d{11}$', r'\ 1-\d{3}-\d{3}-\d{4}')

desired output of df['Contact phone number']:

falcon     1-511-296-2271
dog       None
cat        1-531-296-2271

It is always 11 digits with no spaces or special characters. Thanks!

CodePudding user response：

You can use

 df['Contact phone number'] = df['Contact phone number'].str.replace(r'^(\d)(\d{3})(\d{3})(\d )$', r' 1-\1-\2-\3-\4', regex=True)

Details:

^ - a start of string
(\d) - Group 1 (\1): a digit
(\d{3}) - Group 2 (\2): three digits
(\d{3}) - Group 3 (\3): three digits
(\d ) - Group 4 (\4): any one or more digits (use \d{4} if you need to match exactly four next digits)
$ - end of string.

Output:

>>> df['Contact phone number']
falcon     1-1-511-296-2271
dog                    None
cat        1-1-531-296-2271

See the regex demo.

CodePudding user response：

You can use .str.extract, convert each row of results to a list, and then use .str.join (and of course concatenate a at the beginning):

df['Contact phone number'] = ' '   df['Contact phone number'].dropna().astype(str).str.extract(r'(\d)(\d{3})(\d{3})(\d{3})').apply(list, axis=1).str.join('-')

Output:

>>> df
       Company phone number Contact phone number  num_specimen_seen
falcon       1-541-296-2271        1-511-296-227                 10
dog          1-542-296-2271                  NaN                  2
cat          1-543-296-2271        1-531-296-227                  3