Home > Software design >  How to sort the output of difference between two sets?
How to sort the output of difference between two sets?

Time:07-27

I have two sets a and b and I am printing the difference between these two sets as shown below. The issue I am facing is output is not coming in a sorted way as shown in "Actual output". I need output in such a way that lines that have differences in both the files should be after each other as shown in "Expected output".

a = {
    'Jane Lee|248 Another St.|Boston|US|02130|4535353535.35353535353',
    'Jim Smth|123 Any Street|Boston|US|02134',
    'Name|Address|City|Country|Pinode'
}

b = {
    'Jane Lee|248 Another St.|Boston|US|02130|4535353535.3535355353',
    'Jim Smith|123 Any Stret|Boston|US|02134',
    'Name|Address|Cty|Country|Pincode'
}

res_dict = defaultdict(list)
diff = ([('ua.csv,'   i) if i in a else ('pr.csv,'   i) if i in b else ''
         for i in list(a ^ b)])

if diff == []:
    print('Great!!! There are no differences')
else:
    print('\n'.join(diff))

Actual output:

ua.csv,Name|Address|City|Country|Pinode
ua.csv,Jane Lee|248 Another St.|Boston|US|02130|4535353535.35353535353
pr.csv,Name|Address|Cty|Country|Pincode
pr.csv,Jim Smith|123 Any Stret|Boston|US|02134
pr.csv,Jane Lee|248 Another St.|Boston|US|02130|4535353535.3535355353
ua.csv,Jim Smth|123 Any Street|Boston|US|02134

Expected output: Need output in such a way that lines that have differences in both the files should be after each other like below:

ua.csv,Name|Address|City|Country|Pinode
pr.csv,Name|Address|Cty|Country|Pincode

ua.csv,Jane Lee|248 Another St.|Boston|US|02130|4535353535.35353535353
pr.csv,Jane Lee|248 Another St.|Boston|US|02130|4535353535.3535355353

pr.csv,Jim Smith|123 Any Stret|Boston|US|02134
ua.csv,Jim Smth|123 Any Street|Boston|US|02134

CodePudding user response:

In Python, sets are unordered. They do not store information about which element is first, second, etc. You can picture it as a heap of fruits - you can't tell which fruit is in which position, you can just say if a banana is there or compare the heap with another heap to check if they contain different fruits.

For this reason, an implementation using sets won't work, you will have to use some structures that are ordered, that keep information about the order of their elements (e.g. a list).

CodePudding user response:

You can sort the output by supplying a function via a keyword argument named key= to extract a comparison key from elements when calling the built-in sorted() function. In thus case all the callback function does is split the string on the first comma they all contain and returns the substring following it.

a = {
    'Jane Lee|248 Another St.|Boston|US|02130|4535353535.35353535353',
    'Jim Smth|123 Any Street|Boston|US|02134',
    'Name|Address|City|Country|Pinode'
}

b = {
    'Jane Lee|248 Another St.|Boston|US|02130|4535353535.3535355353',
    'Jim Smith|123 Any Stret|Boston|US|02134',
    'Name|Address|Cty|Country|Pincode'
}

diff = [('ua.csv,'   i) if i in a else ('pr.csv,'   i) if i in b else ''
         for i in list(a ^ b)]

if diff == []:
    print('Great!!! There are no differences')
else:
    diff = sorted(diff, key=lambda v: v.split(',', maxsplit=1)[1])
    print('\n'.join(diff))

Output:

ua.csv,Jane Lee|248 Another St.|Boston|US|02130|4535353535.35353535353
pr.csv,Jane Lee|248 Another St.|Boston|US|02130|4535353535.3535355353
pr.csv,Jim Smith|123 Any Stret|Boston|US|02134
ua.csv,Jim Smth|123 Any Street|Boston|US|02134
ua.csv,Name|Address|City|Country|Pinode
pr.csv,Name|Address|Cty|Country|Pincode
  • Related