Home > Mobile >  Remove duplicates from nested nested list
Remove duplicates from nested nested list

Time:07-26

My data is a If a whole sublist in a nested list belong to another bigger list we delete it.

 [[941, 942, 943, 945, 949, 950, 951, 953, 956, 957, 959],
 [942, 945, 950, 952, 953, 956, 957, 959],
 [943, 944, 945, 946, 947, 948, 949, 950, 951, 953, 954, 956],
 [944, 945, 946, 947, 948, 949, 950, 951, 953, 954, 956],
 [945, 949, 950, 951, 953, 956, 957, 959],
 [946, 947, 948, 949, 951, 953, 954, 956],
 [947, 948, 949, 951, 953, 954, 956],
 [948, 949, 951, 954],
 [949, 950, 951, 953, 956],
 [950, 951, 953, 956, 957, 959],
 [951, 953, 954, 956],
 [952, 955, 957, 959, 960],
 [951, 953, 954, 956],
 [952, 955, 957, 959, 960],
 [953, 956],
 [954],
 [955, 960, 961],
 [956],
 [957, 959],
 [958, 960, 961],
 [959],
 [960, 961],
 [961],
 [962, 964, 965, 966]]

Output:

 [[941, 942, 943, 945, 949, 950, 951, 953, 956, 957, 959], 
 [943, 944, 945, 946, 947, 948, 949, 950, 951, 953, 954, 956], 
 [952, 955, 957, 959, 960], 
 [955, 960, 961], 
 [958, 960, 961],
 [962, 964, 965, 966]]

My code does not work

CodePudding user response:

test_list = [[1, 0, -1], [-1, 0, 1], [-1, 0, 1], [1, 2, 3], [3, 4, 1]]

print("The original list : " str(test_list))

res = list(set(tuple(sorted(sub)) for sub in test_list))

print("The list after duplicate removal : " str(res))

CodePudding user response:

Assuming, L the input list, that the sublists have unique items, that there is no duplicated sublists, and that order does not matter, you can first compute a dictionary of which sublists contain which item:

d = {}
for i,l in enumerate(L):
    for v in l:
        d.setdefault(v, set()).add(i)
# {941: {0},
#  942: {0, 1},
#  943: {0, 2},
#  945: {0, 1, 2, 3, 4},
#  949: {0, 2, 3, 4, 5, 6, 7, 8},
#  ...
# }

Then for each list, compute the intersection of all sets, remove the current index and check whether the output is empty. If it is, then you have a duplicate/superset and should remove it.

out = [l for i,l in enumerate(L)
       if not set.intersection(*(d[x] for x in l)).difference({i})]

output:

[[941, 942, 943, 945, 949, 950, 951, 953, 956, 957, 959],
 [942, 945, 950, 952, 953, 956, 957, 959],
 [943, 944, 945, 946, 947, 948, 949, 950, 951, 953, 954, 956],
 [955, 960, 961],
 [958, 960, 961],
 [962, 964, 965, 966]]

alternative

Other approach to handle strict < comparison, and remove the duplicates. This is based on computing the sets of all list and comparing with the others by decreasing order of size (the complexity is not optimal):

S = list(map(frozenset, L))
S2 = sorted(S, key=len, reverse=True)
seen = set()
[l for l, s in zip(L, S)
 if all(len(s)>len(s2) or not s<s2 for s2 in S2)
 and not s in seen and not seen.add(s)
]

output:

[[941, 942, 943, 945, 949, 950, 951, 953, 956, 957, 959],
 [942, 945, 950, 952, 953, 956, 957, 959],
 [943, 944, 945, 946, 947, 948, 949, 950, 951, 953, 954, 956],
 [952, 955, 957, 959, 960],
 [955, 960, 961],
 [958, 960, 961],
 [962, 964, 965, 966]]
  • Related