First off I know using regular expression is not the best email validation but this is a preliminary step, a better validation comes later.
I want to create a function that validates whether or not an email address is valid but i am not sure how to reference only one column in a data frame.
import pandas as pd
d=[['Automotive','testgmail.com','bob','smith']]
df=pd.DataFrame(d,columns=['industry','email','first',last])
filename='temp'
I want to keep the code in a def function like the one below
def Prospect(colname,errors):
wrong=[]
if #reference to column.str.match(r"^. @. \..{2,}$"):
return
else:
error='this is an invalid email'
wrong.append(error)
return wrong
print(Prospect(errors,colname))
How do I create a function to only reference a specific column in a data frame and only run that column name through the function and create a print statement saying that the email is invalid?
P.S: speed of the operation is not a huge concern since the datasets are not massive.
desired output:
This is an invalid email
CodePudding user response:
I believe you might want:
def Prospect(colname, errors, df=df):
m = df[colname].str.match(r"^. @. \..{2,}$")
if m.all():
pass
else:
error='this is an invalid email'
errors.append(error)
errors = []
Prospect('email', errors, df=df)
print(errors)
output: ['this is an invalid email']
CodePudding user response:
import pandas as pd
import re
d=[['Automotive','testgmail.com','bob','smith'],
['Automotive','[email protected]','bob','smith']]
df=pd.DataFrame(d,columns=['industry','email','first','last'])
email_regex = regex = '^[a-zA-Z0-9.!#$%&’* /=?^_`{|}~-] @[a-zA-Z0-9-] (?:\.[a-zA-Z0-9-] )*$'
df["email"].apply(lambda email: print("This is a valid email: " email if re.search(email_regex,email) else "This is an invalid email: " email))
Results in:
This is an invalid email: testgmail.com
This is a valid email: [email protected]
Process finished with exit code 0
CodePudding user response:
Ok, here's my take on your question (I've removed the "errors" argument until I understand what it's supposed to be/do):
import pandas as pd
import re
d=[['Automotive','testgmail.com','bob','smith'],
['Automotive','[email protected]','bob','smith']]
df=pd.DataFrame(d,columns=['industry','email','first','last'])
def Prospect(colname):
email_regex = r"^. @. \..{2,}$"
wrong=[]
for i in range(len(df)):
this_email = df[colname][i]
if re.search(email_regex,this_email):
continue
else:
error=f'{this_email} is an invalid email'
wrong.append(error)
return wrong
print(Prospect('email'))
# ['testgmail.com is an invalid email']