Let's say I have a CSV invites.csv
:
Email Invite Email Denied
[email protected] [email protected]
[email protected] [email protected]
[email protected] [email protected]
[email protected]
[email protected]
[email protected]
I want to compare both columns and have a new column made, Emails Left
, with just the emails that are not in the Email Denied
column. An output like this:
Email Invite Email Denied Emails Left
[email protected] [email protected] [email protected]
[email protected] [email protected] [email protected]
[email protected] [email protected] [email protected]
[email protected] [email protected]
[email protected]
[email protected]
[email protected]
This is my code so far:
import pandas as pd
Dir='invites.csv'
df = pd.read_csv(Dir)
df = pd.DataFrame(df)
a = len(df['Email invite'])
aList = []
for i in range(a):
if df['Email invite'][i] != df['Email Denied'][i]:
aList.append(df['Email Invite'][i])
#place list as third column df['Emails Left']
CodePudding user response:
I figured it out.
I had to make the column rows match each other first before doing the IF statement.
df = pd.read_csv(Dir)
df = pd.DataFrame(df)
df = pd.merge(df[["Email Invite"]],
df[['Email Denied']],
left_on='Email Invite',
right_on='Email Denied',
how='left')
This then leaves the DataFrame looking like this:
Email Invite Email Denied
[email protected] [email protected]
[email protected]
[email protected]
[email protected] [email protected]
[email protected]
[email protected] [email protected]
I continue to my FOR loop and IF statement:
a = len(df['Email invite'])
aList = []
for i in range(a):
if df['Email invite'][i] != df['Email Denied'][i]:
aList.append(df['Email Invite'][i])
df['Emails Left'] = pd.Series(aList)
Now I have my extra column:
Email Invite Email Denied Emails Left
[email protected] [email protected] [email protected]
[email protected] [email protected]
[email protected] [email protected]
[email protected] [email protected]
[email protected]
[email protected] [email protected]
And I can now transfer it to a new CSV.
df.to_csv("NewInvite.csv", index=False)
Now the program is running good.
CodePudding user response:
It is not recommended to use dataframes with columns of different lengths. You have to pad out remaining elements with NaN or ' ' or something to make them equal lengths. List would work better here:
import pandas as pd
df = pd.DataFrame({'Email Invite': ['[email protected]', '[email protected]', '[email protected]','[email protected]' ,'[email protected]' ,'[email protected]'], 'Email Denied': ['[email protected]', '[email protected]' ,'[email protected]', '','','']})
email_invited = list(df['Email Invite'])
email_denied = list(df[df['Email Denied']!='']['Email Denied'])
email_left = [email for email in email_invited if email not in email_denied]
print(email_left)