I have a python dataframe (df) created from a csv. I want to take every column name that contains 'PHONE' (or 'phone' or 'Phone') and change all their rows to be in the format of 5555555555. So:
(555) 555-5555 would be 5555555555,
555-555-5555 would be 5555555555,
and so on.
I tried the following, but got a syntax error. Hopefully I was at least kinda-sorta close:
phone_format = df.loc[:, df.columns.str.contains('PHONE')]
for col in phone_format:
df['col'] = df.['col'].map(lambda x: x.replace('.', '').replace(' ', '').replace('-', '').replace('(', '').replace(')', ''))
CodePudding user response:
Use filter
to select the columns with "phone" (in a case-insensitive way using the (?i)phone
regex) and apply
with str.replace
to remove the non-digits, finally update
the DataFrame in place.
df.update(df.filter(regex='(?i)phone').apply(lambda s: s.str.replace(r'\D ', '', regex=True)))
Example:
# before
pHoNe other Phone other col
0 (555) 55 5555 555-555-555 (55-55)
# after
pHoNe other Phone other col
0 555555555 555555555 (55-55)
Reproducible input:
df = pd.DataFrame({'pHoNe': ['(555) 55 5555'], 'other Phone': ['555-555-555'], 'other col': ['(55-55)']})
CodePudding user response:
phone_format = df.loc[:, df.columns.str.contains('PHONE')]
for col in phone_format:
df[col] = df[col].str.replace(r"\D ", "", regex=True)
CodePudding user response:
From using your code as a starting point for a minimal working example:
df = pd.DataFrame([['(555) 555-5555', '555-555-5555']], columns=['phone', 'Phone'])
phone_format = df.columns[df.columns.str.contains(pat='PHONE', case=False)]
for col in phone_format:
df[col] = df[col].map(lambda x: x.replace('.', '').replace(' ', '').replace('-', '').replace('(', '').replace(')', ''))
df