I am trying to clean phone numbers using phonenumbers library. I created function to get country code & national number and store in columns 'country_code' and 'national_number'
I am trying to use apply() on dataframe which has noisy numbers. I am trying to use apply over loop due to performance gain. Below is code:
import phonenumbers
import pandas as pd
df_phone = pd.read_csv(r'D:\Code\Address-Nominatim\Address-Nominatim\Phone_Valid.csv',encoding='utf8')
df_phone['country_code'] = ''
df_phone['national_number'] = ''
df_phone['valid']=''
def phone_valid(phone):
try:
#print(phone['PHONE'] " " phone['COUNTRY'])
x = phonenumbers.parse(phone['PHONE'],phone['COUNTRY'])
df_phone['country_code'] = x.country_code
df_phone['national_number'] = x.national_number
df_phone['valid']=phonenumbers.is_possible_number(x)
except:
df_phone['country_code'] = "Error"
df_phone['national_number'] = "Error"
df_phone=df_phone.apply(phone_valid,axis=1)
print(df_phone)
but dataframe df_phone only has none values.Below is sample output of df_phone
none | none |
---|---|
1 | none |
2 | none |
Can someone tell me what mistake I am making?
Regards,
CodePudding user response:
You aren't supposed to assign into the dataframe when you use apply. (Think of the case where you didn't actually even have access to the df_phone
(global) variable.)
Instead, just return new values from apply
so Pandas will assign them; as you need to return multiple columns, you'll need something like (self-contained example; replace phone_valid
with your implementation):
import pandas as pd
df_phone = pd.DataFrame({
'PHONE': ['100', '200', '300', '400', '500'],
'COUNTRY': ['FI', 'US', 'SV', 'DE', 'FR'],
})
def parse(phone, country):
return (phone * 3, country[::-1])
def phone_valid(phone):
national, country = parse(phone['PHONE'], phone['COUNTRY'])
return (national, country, True)
df_phone[['national', 'country', 'valid']] = df_phone.apply(phone_valid, axis=1, result_type="expand")
print(df_phone)
The output is
PHONE COUNTRY national country valid
0 100 FI 100100100 IF True
1 200 US 200200200 SU True
2 300 SV 300300300 VS True
3 400 DE 400400400 ED True
4 500 FR 500500500 RF True