In data frame, how to remove unnecessary thing from Contact number
df
Id Phone
1 ( 1)123-456-7890
2 (123)-(456)-(7890)
3 123-456-7890
Final Output
Id Phone
1 1234567890
2 1234567890
3 1234567890
CodePudding user response:
I would use a regex with str.replace
here:
df['Phone2'] = df['Phone'].str.replace(r'^(?:\(\ \d \))|\D', '', regex=True)
output:
Id Phone Phone2
0 1 ( 1)123-456-7890 1234567890
1 2 (123)-(456)-(7890) 1234567890
2 3 123-456-7890 1234567890
regex:
^(?:\(\ \d \)) # match a ( 0) leading identifier
| # OR
\D # match a non-digit
notes on the international prefix:
This might be important to keep.
Keep the prefixes:
df['Phone2'] = df['Phone'].str.replace(r'[^ \d]', '', regex=True)
output:
Id Phone Phone2
0 1 ( 1)123-456-7890 11234567890
1 2 (123)-(456)-(7890) 1234567890
2 3 123-456-7890 1234567890
3 4 ( 380)123-456-7890 3801234567890
Only drop a specific prefix (here 1
):
df['Phone2'] = df['Phone'].str.replace(r'^(?:\(\ 1\))|[^ \d]', '', regex=True)
# or, more flexible
df['Phone2'] = df['Phone'].str.replace(r'(?:\ 1\D)|[^ \d]', '', regex=True)
output:
Id Phone Phone2
0 1 ( 1)123-456-7890 1234567890
1 2 (123)-(456)-(7890) 1234567890
2 3 123-456-7890 1234567890
3 4 ( 380)123-456-7890 3801234567890