If column's items doesn't equal another column's items, append to a new column-CodePudding

Let's say I have a CSV invites.csv:

Email Invite                    Email Denied                                                                                       
[email protected]                [email protected]                       
[email protected]             [email protected]             
[email protected]              [email protected]           
[email protected]                           
[email protected]                             
[email protected]

I want to compare both columns and have a new column made, Emails Left, with just the emails that are not in the Email Denied column. An output like this:

Email Invite                    Email Denied               Emails Left                                                                                 
[email protected]                [email protected]           [email protected]               
[email protected]             [email protected]          [email protected]   
[email protected]              [email protected]        [email protected]   
[email protected]                                          [email protected]
[email protected]                             
[email protected]
[email protected]

This is my code so far:

import pandas as pd

Dir='invites.csv'

df = pd.read_csv(Dir)
df = pd.DataFrame(df)

a = len(df['Email invite'])
aList = []

for i in range(a):
    if df['Email invite'][i] != df['Email Denied'][i]:
        aList.append(df['Email Invite'][i])  

#place list as third column df['Emails Left']

CodePudding user response：

I figured it out.

I had to make the column rows match each other first before doing the IF statement.

df = pd.read_csv(Dir)
df = pd.DataFrame(df)

df = pd.merge(df[["Email Invite"]],
              df[['Email Denied']],
              left_on='Email Invite',
              right_on='Email Denied',
              how='left')

This then leaves the DataFrame looking like this:

Email Invite                    Email Denied                                                                                       
[email protected]                [email protected]                       
[email protected]                          
[email protected]                         
[email protected]               [email protected]             
[email protected]                             
[email protected]             [email protected]

I continue to my FOR loop and IF statement:

a = len(df['Email invite'])
aList = []

for i in range(a):
    if df['Email invite'][i] != df['Email Denied'][i]:
        aList.append(df['Email Invite'][i])  

df['Emails Left'] = pd.Series(aList)

Now I have my extra column:

Email Invite                  Email Denied           Emails Left                                                                                      
[email protected]              [email protected]       [email protected]                
[email protected]                                  [email protected]  
[email protected]                                   [email protected]
[email protected]             [email protected]              
[email protected]                             
[email protected]           [email protected]

And I can now transfer it to a new CSV.

df.to_csv("NewInvite.csv", index=False)

Now the program is running good.

CodePudding user response：

It is not recommended to use dataframes with columns of different lengths. You have to pad out remaining elements with NaN or ' ' or something to make them equal lengths. List would work better here:

import pandas as pd

df = pd.DataFrame({'Email Invite': ['[email protected]', '[email protected]', '[email protected]','[email protected]' ,'[email protected]' ,'[email protected]'], 'Email Denied': ['[email protected]', '[email protected]' ,'[email protected]', '','','']})


email_invited = list(df['Email Invite'])
email_denied = list(df[df['Email Denied']!='']['Email Denied'])

email_left = [email for email in email_invited if email not in email_denied]
print(email_left)