Read each line from two files and print line that does not exists in other-CodePudding

Team, I have two files with some duplicates. I want to print or create new list with unique ones. however, my list is getting printed empty. not sure why

f1 = open(file1, 'r')
f2 = open(file2, 'r')
unique = []
for lineA in f1.readlines():
        for lineB in f2.readlines():
            if lineA != lineB:
                print("lineA not equal to lineB", lineA, lineB)
            else:
                unique.append(lineB)
print(unique)

output

lineA not equal to lineB  node789
  node321

lineA not equal to lineB  node789
 node12345

[]

expected

lineA not equal to lineB  node789
  node321

lineA not equal to lineB  node789
 node12345

[node321,node12345]

Second Approach looking at comments list is getting populated but all empty and not recognizing actual strings.

 [~] $ cat  ~/backup/2strings.log
restr1
restr2

 [~] $ cat ~/backup/4strings.log 
restr1
restr2
restr3
restr4

file2 = os.environ.get('HOME')   '/backup/2strings.log'
file1 = os.environ.get('HOME')   '/backup/4strings.log'
f1 = open(file1, 'r')
f2 = open(file2, 'r')
unique = []
for lineA in f1.readlines():
        for lineB in f2.readlines():
            # if lineA.rstrip() != lineB.rstrip():
            if lineA.strip() != lineB.strip():
                print("lineA not equal to lineB", lineA, lineB)
            else:
                print("found uniq")
        unique.append(lineB.rstrip())
print(unique)
print(len(unique))

output

found uniq
lineA not equal to lineB restr1
 restr2

lineA not equal to lineB restr1
 

['', '', '', '', '']
5

CodePudding user response：

I recommend you to use a different but simpler approach. Use sets data structures. Link - https://docs.python.org/3/tutorial/datastructures.html#sets

Pseudo code

unique = []
items01 = set([line.strip() for line in open(file1).readlines()])
items02 = set([line.strip() for line in open(file2).readlines()])

# unique items not present file2
print(list(items01 - items02))
unique  = list(items01 - items02)

# unique items not present file2
print(list(items02 - items01))
unique  = list(items02 - items01)

# all unique items
print(unique)

In your code, you are using file01 as reference to check items in file01. You need to do the reverse of it too. Challenge No. 2 is too much time complexity. Python sets does hashing internally for performance boost, so use sets.

CodePudding user response：

As I see it from what you post, the only way your expected output deviates from your actual output is that node321 and node12345 are not added to the list unique, which is printed at the end. That is hardly surprising because in your code, you're appending lineB to unique in those cases where lineA and lineB match (because the appending takes place in an else after if lineA != lineB:).