Home > Mobile >  How do I force a blank for rows in a dataframe that have any str or character apart from numerics?
How do I force a blank for rows in a dataframe that have any str or character apart from numerics?

Time:05-06

I have a datframe

>temp
    Age Rank PhoneNumber State City
    10   1    99-22344-1 Ga    abc
    15   12    No        Ma    xyz

For the column(Phone Number), I want to strip all characters like - unless they are full phone numbers and if it says No or any word apart from a numeric, I want it to be a blank. How can I do this

My attempt is able to handle special chars but not words symbols like 'No'

temp['PhoneNumber '] = temp['PhoneNumber '].str.replace('[^\d] ', '')

Desired Output df -

>temp
    Age Rank PhoneNumber State City
    10   1    99223441    Ga    abc
    15   12               Ma    xyz

CodePudding user response:

This does the job.

import pandas as pd
import re

data = [
    [10, 1, '99-223344-1', 'GA', 'Abc'],
    [15, 12, "No", 'MA', 'Xyz']
]

df = pd.DataFrame(data, columns=['Age Rank PhoneNumber State City'.split()])
print(df)

def valphone(p):
    p = p['PhoneNumber']
    if re.match(r'[123456789-] $', p):
        return p
    else:
        return ""

print(df['PhoneNumber'])
df['PhoneNumber'] = df['PhoneNumber'].apply(valphone, axis=1)
print(df)

Output:

  Age Rank  PhoneNumber State City
0  10    1  99-223344-1    GA  Abc
1  15   12           No    MA  Xyz
  Age Rank  PhoneNumber State City
0  10    1  99-223344-1    GA  Abc
1  15   12                 MA  Xyz

I do have to admit to a bit of frustration with this. I EXPECTED to be able to do

df['PhoneNumber'] = df['PhoneNumber'].apply(valphone)

because df['PhoneNumber'] should return a Series, and the Series.apply function should pass me one value at a time. However, that's not what happens here, and I don't know why. df['PhoneNumber'] returns a DataFrame instead of a Series, so I have to use the column reference inside the function.

Thus, YOU may need to do some experimentation. If df['PhoneNumber'] returns a Series for you, then you don't need the axis=1, and you don't need the p = p['PhoneNumber'] line in the function.

Followup

OK, assuming the presence of a "phone number validation" module, as is mentioned in the comments, this becomes:

import phonenumbers
...
def valphone(p):
    p = p['PhoneNumber'] # May not be required
    n = phonenumbmers.parse(p)
    if phonenumbers.is_possible_number(n):
        return p
    else:
        return ''
...

CodePudding user response:

temp['PhoneNumber'] = temp['PhoneNumber'].str.findall(r'\d').str.join('')
  • Related