Home > Net >  Pandas APPLY - Dataframe has NULL values
Pandas APPLY - Dataframe has NULL values

Time:07-18

I am trying to clean phone numbers using phonenumbers library. I created function to get country code & national number and store in columns 'country_code' and 'national_number'

I am trying to use apply() on dataframe which has noisy numbers. I am trying to use apply over loop due to performance gain. Below is code:

import phonenumbers
import pandas as pd
df_phone = pd.read_csv(r'D:\Code\Address-Nominatim\Address-Nominatim\Phone_Valid.csv',encoding='utf8')
df_phone['country_code'] = ''
df_phone['national_number'] = ''
df_phone['valid']=''

def phone_valid(phone):
    try:
        #print(phone['PHONE']   " "   phone['COUNTRY'])
        x = phonenumbers.parse(phone['PHONE'],phone['COUNTRY'])
        df_phone['country_code'] = x.country_code
        df_phone['national_number'] = x.national_number
        df_phone['valid']=phonenumbers.is_possible_number(x)
    except:
        df_phone['country_code'] = "Error"
        df_phone['national_number'] = "Error"


df_phone=df_phone.apply(phone_valid,axis=1)

print(df_phone)

but dataframe df_phone only has none values.Below is sample output of df_phone

none none
1 none
2 none

Can someone tell me what mistake I am making?

Regards,

CodePudding user response:

You aren't supposed to assign into the dataframe when you use apply. (Think of the case where you didn't actually even have access to the df_phone (global) variable.)

Instead, just return new values from apply so Pandas will assign them; as you need to return multiple columns, you'll need something like (self-contained example; replace phone_valid with your implementation):

import pandas as pd

df_phone = pd.DataFrame({
    'PHONE': ['100', '200', '300', '400', '500'],
    'COUNTRY': ['FI', 'US', 'SV', 'DE', 'FR'],
})


def parse(phone, country):
    return (phone * 3, country[::-1])


def phone_valid(phone):
    national, country = parse(phone['PHONE'], phone['COUNTRY'])
    return (national, country, True)


df_phone[['national', 'country', 'valid']] = df_phone.apply(phone_valid, axis=1, result_type="expand")

print(df_phone)

The output is

  PHONE COUNTRY   national country  valid
0   100      FI  100100100      IF   True
1   200      US  200200200      SU   True
2   300      SV  300300300      VS   True
3   400      DE  400400400      ED   True
4   500      FR  500500500      RF   True
  • Related