For every pandas dataframe column with 'phone', remove non-numerals-CodePudding

I have a python dataframe (df) created from a csv. I want to take every column name that contains 'PHONE' (or 'phone' or 'Phone') and change all their rows to be in the format of 5555555555. So:

(555) 555-5555 would be 5555555555,

555-555-5555 would be 5555555555,

and so on.

I tried the following, but got a syntax error. Hopefully I was at least kinda-sorta close:

phone_format = df.loc[:, df.columns.str.contains('PHONE')]
for col in phone_format:
    df['col'] = df.['col'].map(lambda x: x.replace('.', '').replace(' ', '').replace('-', '').replace('(', '').replace(')', ''))

CodePudding user response：

Use filter to select the columns with "phone" (in a case-insensitive way using the (?i)phone regex) and apply with str.replace to remove the non-digits, finally update the DataFrame in place.

df.update(df.filter(regex='(?i)phone').apply(lambda s: s.str.replace(r'\D ', '', regex=True)))

Example:

# before
           pHoNe  other Phone  other col
0  (555) 55 5555  555-555-555    (55-55)

# after
       pHoNe  other Phone  other col
0  555555555    555555555    (55-55)

Reproducible input:

df = pd.DataFrame({'pHoNe': ['(555) 55 5555'], 'other Phone': ['555-555-555'], 'other col': ['(55-55)']})

CodePudding user response：

phone_format = df.loc[:, df.columns.str.contains('PHONE')]
for col in phone_format:    
    df[col] = df[col].str.replace(r"\D ", "", regex=True)

CodePudding user response：

From using your code as a starting point for a minimal working example:

df = pd.DataFrame([['(555) 555-5555', '555-555-5555']], columns=['phone', 'Phone'])
phone_format = df.columns[df.columns.str.contains(pat='PHONE', case=False)]
for col in phone_format:
    df[col] = df[col].map(lambda x: x.replace('.', '').replace(' ', '').replace('-', '').replace('(', '').replace(')', ''))
df