so i've looked online at a far exmaples but they all seem to assume the data is in order. So Row 1 in Both files has the same information.
In my case Row 1 File X has an IP and DNS. The idea is to check if this IP address can be found in any of the rows in File Y.
Ideally I'd get a list of IP addresses not found in File Y.
I tried to import the files into Pandas but thats about where my knowledge ends.
Edit: Sample
File 1
dns,ip
what.dont.cz.,12.34.21.90
........
File 2
ip,dns
1.32.20.25, sea.ocean.cz
........
12.34.21.90 what.dont.cz
..........
CodePudding user response:
Try this:
df_file1.loc[~df_file1.ip.isin(df_file2.ip)]
CodePudding user response:
You can use the csv module and itertools module to do this. You’ll load both files and perform a linear search using nested loops like below. This will work but if your csv files are considerably large, it'll be better to import them into a sqlite table and performing the query there.
import csv
import itertools
file_x = "File X.csv"
file_y = "File Y.csv"
not_found = []
file_x_csv = open(file_x)
file_x_read = csv.DictReader(file_x_csv)
for row_x in file_x_read:
running = True
ip_x = row_x["ip"]
file_y_csv = open(file_y)
file_y_read = csv.DictReader(file_y_csv)
while running:
try:
row_y = next(file_y_read)
if ip_x == row_y["ip"]:
running = False
except StopIteration: # Iterator concluded, search not found
not_found.append(ip_x)
running = False
print(not_found)