Home > Net >  Python, Excel and Pandas: drop rows based on multiple criterias
Python, Excel and Pandas: drop rows based on multiple criterias

Time:10-30

First of all, I know similar questions have been asked in the past, but none of them apply to my use case, not I know Pandas enough to get there on my own. So, I have 2 spreadsheets (xlsx), let's call them file 1 and file 2

File1:

ID IP Name Comments OS Title Status
1 127.0.0.1 Something Windows 3.11 Title 1 Active
2 127.0.0.1 Something else Windows 3.11 Title 1 Fixed
3 127.0.0.1 New one Windows 3.11 Title 1 Fixed
4 127.0.0.1 Yet another one Windows 3.11 Title 1 Active

File2:

ID IP Name Comments OS Title Status
1 127.0.0.1 Something Windows 3.11 Title 1 Fixed
4 127.0.0.1 Yet another one Windows 3.11 Title 1 Fixed
5 127.0.0.1 Uh-oh Windows 3.11 Title 1 Active

I'm merging these 2 files in Python, with Pandas:

df1 = pd.read_excel('file1.xlsx')
df2 = pd.read_excel('file2.xlsx')
#I just need a column with the file name, hence these variables
file1 = "file1.xlsx"
file2 = "file2.xlsx"
df1 = df1.sort_values('ID')
#Adds a new column with the file name
df1["File"] = file1
df2 = df2.sort_values('ID')
#Adds a new column with the file name
df2["File"] = file2
dfs = [df1, df2]
#Black magic to merge the 2 files into 1
df_diff = pd.concat(dfs).drop_duplicates(keep=False, subset=['IP', 'Name', 'Comments', 'ID', 'Title', 'Status'])
df_diff = df_diff.drop(df_diff[])
df_diff.to_excel(r'file3.xlsx')

And this is the resulting file:

ID IP Name Comments OS Title Status File
1 127.0.0.1 Something Windows 3.11 Title 1 Active File 1
1 127.0.0.1 Something Windows 3.11 Title 1 Fixed File 2
2 127.0.0.1 Something else Windows 3.11 Title 1 Fixed File 1
3 127.0.0.1 New one Windows 3.11 Title 1 Fixed File 1
4 127.0.0.1 Yet another one Windows 3.11 Title 1 Active File 1
4 127.0.0.1 Yet another one Windows 3.11 Title 1 Fixed File 2
5 127.0.0.1 Uh-oh Windows 3.11 Title 1 Active File 2

As you can see, in the new file there is a new column indicating what file that row comes from (row File). Now, I would like Pandas to check for the following condition:

any row present in file 1 only, with the Status set to fixed.

I would like to drop from file 3 any rows matching the above condition, so that I end up with the following table:

ID IP Name Comments OS Title Status File
1 127.0.0.1 Something Windows 3.11 Title 1 Active File 1
1 127.0.0.1 Something Windows 3.11 Title 1 Fixed File 2
4 127.0.0.1 Yet another one Windows 3.11 Title 1 Active File 1
4 127.0.0.1 Yet another one Windows 3.11 Title 1 Fixed File 2
5 127.0.0.1 Uh-oh Windows 3.11 Title 1 Active File 2

How could I achieve this? As an addendum, I would also like the following condition to be met:

any row present in both file, with the only difference being in file 1 Status is Active and in file 2 Status is Fixed

CodePudding user response:

The first condition is actually pretty simple but a bit unintuitive. Selecting based on multiple criteria works pretty much the same as with a single criterium, but you just add a binary & and put every criterium between parentheses:

df[ (df['File'] == 'File 1') & (df['Status'] == 'Fixed') ]

It has to be '&' and not 'and'. The second condition can be met in a similar way:

IDs_file1_active = df[ (df['File'] == 'File 1') & (df['Status'] == 'Active') ]['ID']

selects the ID of every Active entry from file 1. Now you can selects the rows with their IDs from the original table:

file1_active = df[ df['ID'].isin(IDs_file1_active) ]

and apply the second condition:

file1_active [ (file1_active ['File'] == 'File 2') & (file1_active ['Status'] == 'Fixed') ]

This should do what you're asking.

CodePudding user response:

Use conditionals for selecting. To exclude use tilde

cond=(df['File']=='File 1') & (df['Status'] =='Fixed')
df[~cond]

CodePudding user response:

Let's suppose your last dataset is call de df

So, you can try the following code

mask = (df["File"]=="File 1") & (df["Status"]=="Fixed")
df = df[~mask]

df
ID IP Name Comments OS Title Status File
1 127.0.0.1 Something Windows 3.11 Title 1 Active File 1
1 127.0.0.1 Something Windows 3.11 Title 1 Fixed File 2
4 127.0.0.1 Yet another one Windows 3.11 Title 1 Active File 1
4 127.0.0.1 Yet another one Windows 3.11 Title 1 Fixed File 2
5 127.0.0.1 Uh-oh Windows 3.11 Title 1 Active File 2
  • Related