Team, I have two files with some duplicates. I want to print or create new list with unique ones. however, my list is getting printed empty. not sure why
f1 = open(file1, 'r')
f2 = open(file2, 'r')
unique = []
for lineA in f1.readlines():
for lineB in f2.readlines():
if lineA != lineB:
print("lineA not equal to lineB", lineA, lineB)
else:
unique.append(lineB)
print(unique)
output
lineA not equal to lineB node789
node321
lineA not equal to lineB node789
node12345
[]
expected
lineA not equal to lineB node789
node321
lineA not equal to lineB node789
node12345
[node321,node12345]
Second Approach looking at comments list is getting populated but all empty and not recognizing actual strings.
[~] $ cat ~/backup/2strings.log
restr1
restr2
[~] $ cat ~/backup/4strings.log
restr1
restr2
restr3
restr4
file2 = os.environ.get('HOME') '/backup/2strings.log'
file1 = os.environ.get('HOME') '/backup/4strings.log'
f1 = open(file1, 'r')
f2 = open(file2, 'r')
unique = []
for lineA in f1.readlines():
for lineB in f2.readlines():
# if lineA.rstrip() != lineB.rstrip():
if lineA.strip() != lineB.strip():
print("lineA not equal to lineB", lineA, lineB)
else:
print("found uniq")
unique.append(lineB.rstrip())
print(unique)
print(len(unique))
output
found uniq
lineA not equal to lineB restr1
restr2
lineA not equal to lineB restr1
['', '', '', '', '']
5
CodePudding user response:
I recommend you to use a different but simpler approach. Use sets
data structures. Link - https://docs.python.org/3/tutorial/datastructures.html#sets
Pseudo code
unique = []
items01 = set([line.strip() for line in open(file1).readlines()])
items02 = set([line.strip() for line in open(file2).readlines()])
# unique items not present file2
print(list(items01 - items02))
unique = list(items01 - items02)
# unique items not present file2
print(list(items02 - items01))
unique = list(items02 - items01)
# all unique items
print(unique)
In your code, you are using file01 as reference to check items in file01. You need to do the reverse of it too. Challenge No. 2 is too much time complexity. Python sets does hashing internally for performance boost, so use sets.
CodePudding user response:
As I see it from what you post, the only way your expected output deviates from your actual output is that node321 and node12345 are not added to the list unique
, which is printed at the end. That is hardly surprising because in your code, you're appending lineB
to unique
in those cases where lineA
and lineB
match (because the appending takes place in an else
after if lineA != lineB:
).