I can't find a solution online and I know this should be easy but I can't figure out what is wrong with my regex:
here is my code:
df = pd.DataFrame({'Company phone number': [' 1-541-296-2271', ' 1-542-296-2271', ' 1-543-296-2271'],
'Contact phone number': ['15112962271', None,'15312962271'],
'num_specimen_seen': [10, 2,3]},
index=['falcon', 'dog','cat'])
df['Contact phone number'] = df['Contact phone number'].str.replace('^\d{11}$', r'\ 1-\d{3}-\d{3}-\d{4}')
desired output of df['Contact phone number']
:
falcon 1-511-296-2271
dog None
cat 1-531-296-2271
It is always 11 digits with no spaces or special characters. Thanks!
CodePudding user response:
You can use
df['Contact phone number'] = df['Contact phone number'].str.replace(r'^(\d)(\d{3})(\d{3})(\d )$', r' 1-\1-\2-\3-\4', regex=True)
Details:
^
- a start of string(\d)
- Group 1 (\1
): a digit(\d{3})
- Group 2 (\2
): three digits(\d{3})
- Group 3 (\3
): three digits(\d )
- Group 4 (\4
): any one or more digits (use\d{4}
if you need to match exactly four next digits)$
- end of string.
Output:
>>> df['Contact phone number']
falcon 1-1-511-296-2271
dog None
cat 1-1-531-296-2271
See the regex demo.
CodePudding user response:
You can use .str.extract
, convert each row of results to a list, and then use .str.join
(and of course concatenate a
at the beginning):
df['Contact phone number'] = ' ' df['Contact phone number'].dropna().astype(str).str.extract(r'(\d)(\d{3})(\d{3})(\d{3})').apply(list, axis=1).str.join('-')
Output:
>>> df
Company phone number Contact phone number num_specimen_seen
falcon 1-541-296-2271 1-511-296-227 10
dog 1-542-296-2271 NaN 2
cat 1-543-296-2271 1-531-296-227 3