Python: Possible to pass a list of values to an if statement?-CodePudding

I am merging multiple excel sheets into a pandas DataFrame and extracting the email address from a Report_Name column. I have been able to execute what I need but the hardship is that there are over 5000 email addresses that I would have to run checks against (in this example only five). While I could type out all the conditions for these, it's not practical.

So my question is:
Can I pass a single list that includes these 5000 email addresses as a condition (somehow) in an if statement? Below is my code of how I am currently doing it.

| **Report_Name**                                    |
|--------------------------------------------------- |
|SYN-Laptops-Nov10 (002)                             |
|something_offer [email protected]                     |
|another thing [email protected]                    |
|my offer is attached ooo 12-31 [email protected] |
|copy of offer [email protected]                    |
|private offering copy [email protected]             |

#----------- extract the email address from the Report Name column -----------#

#blank list to collect and store the email addresses
collected_emails = []

#----------- For Loop to iterate through the values under the ['Report_Name'] column -----------#
for report_name_value in excel_df['Report_Name']:
    if '[email protected]' in report_name_value:
        collected_emails.append('[email protected]')
    elif '[email protected]' in report_name_value:
        collected_emails.append('[email protected]')
    elif '[email protected]' in report_name_value:
        collected_emails.append('[email protected]')
    elif '[email protected]' in report_name_value:
        collected_emails.append('[email protected]')
    elif '[email protected]' in report_name_value:
        collected_emails.append('[email protected]')
    else:
        collected_emails.append('No Email Address')

#create DataFrame for the collected emails
collected_emails_df = pd.DataFrame(collected_emails, columns = ['Email_Address'])

#create master_df to concat both the excel_df and collected_emails_df together
master_df = pd.concat([excel_df, collected_emails_df,], axis = 1)     

#export master DataFrame to an excel file and save it on the SharePoint directory
master_df.to_excel(output_local 'Offers.xlsx', index=False)}

RESULT

| **Report_Name**                                     |    **Email_Address**       |
| SYN-Laptops-Nov10 (002)                             |    No Email Address        |
| something_offer [email protected]                     |    [email protected]         |
| another thing [email protected]                    |    [email protected]      |
| my offer is attached ooo 12-31 [email protected] |    [email protected]    |
| copy of offer [email protected]                    |    [email protected]      |
| private offering copy [email protected]             |    [email protected]       |

I am a beginner with python and was unable to pull up any references of how to tackle this problem specifically. Hence the post. Thanks for your time and I appreciate any advice you can offer.

CodePudding user response：

Sounds like a good case for lambda and filter!

list_to_check = ['[email protected]', ...] # Pass this in to your function

is_match_func = lambda x: x in excel_df['Report_Name']

all_matches = list(filter(is_match_func, list_to_check))

for email in all_matches:
    collected_emails.append(email)

CodePudding user response：

Given:

# df
                                         Report_Name
0                            SYN-Laptops-Nov10 (002)
1                    something_offer [email protected]
2                   another thing [email protected]
3  my offer is attached ooo 12-31 rocksteps@domai...
4                   copy of offer [email protected]
5            private offering copy [email protected]

Doing:

df['Emails'] = df.Report_Name.str.extract('(\w @\w \.\w )')
print(df)

Regex Explanation - basically, we know that there aren't spaces in an email, and that they all contain [chars @ chars . chars]

Output:

                                         Report_Name                Emails
0                            SYN-Laptops-Nov10 (002)                   NaN
1                    something_offer [email protected]       [email protected]
2                   another thing [email protected]    [email protected]
3  my offer is attached ooo 12-31 rocksteps@domai...  [email protected]
4                   copy of offer [email protected]    [email protected]
5            private offering copy [email protected]     [email protected]

It's not clear what you want to do after this point...

CodePudding user response：

You can simply use pandas.Series.str.extract with a regular expression to extract all the emails.

Try this :

collected_emails = (
                    excel_df['Report_Name']
                     .str.extract(r'([a-zA-Z0-9.-_] @. \.com)')
                     .dropna()
                     .squeeze()
                     .tolist()
                    )

# Output :

print(collected_emails)

['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']