Comparing 2 CSV files with Domain and IP. Rows are in different order. Reading Row 1 File X compare-CodePudding

so i've looked online at a far exmaples but they all seem to assume the data is in order. So Row 1 in Both files has the same information.

In my case Row 1 File X has an IP and DNS. The idea is to check if this IP address can be found in any of the rows in File Y.

Ideally I'd get a list of IP addresses not found in File Y.

I tried to import the files into Pandas but thats about where my knowledge ends.

Edit: Sample

File 1

dns,ip

what.dont.cz.,12.34.21.90

........

File 2

ip,dns

1.32.20.25, sea.ocean.cz

........

12.34.21.90 what.dont.cz

..........

CodePudding user response：

Try this:

df_file1.loc[~df_file1.ip.isin(df_file2.ip)]

CodePudding user response：

You can use the csv module and itertools module to do this. You’ll load both files and perform a linear search using nested loops like below. This will work but if your csv files are considerably large, it'll be better to import them into a sqlite table and performing the query there.

import csv
import itertools

file_x = "File X.csv"
file_y = "File Y.csv"

not_found = []

file_x_csv = open(file_x)
file_x_read = csv.DictReader(file_x_csv)

for row_x in file_x_read:
    running = True
    ip_x = row_x["ip"]
    file_y_csv = open(file_y)
    file_y_read = csv.DictReader(file_y_csv)
    while running:
        try:
            row_y = next(file_y_read)
            if ip_x == row_y["ip"]:
                running = False
        except StopIteration:  # Iterator concluded, search not found
            not_found.append(ip_x)
            running = False
            
print(not_found)