My data is a If a whole sublist in a nested list belong to another bigger list we delete it.
[[941, 942, 943, 945, 949, 950, 951, 953, 956, 957, 959],
[942, 945, 950, 952, 953, 956, 957, 959],
[943, 944, 945, 946, 947, 948, 949, 950, 951, 953, 954, 956],
[944, 945, 946, 947, 948, 949, 950, 951, 953, 954, 956],
[945, 949, 950, 951, 953, 956, 957, 959],
[946, 947, 948, 949, 951, 953, 954, 956],
[947, 948, 949, 951, 953, 954, 956],
[948, 949, 951, 954],
[949, 950, 951, 953, 956],
[950, 951, 953, 956, 957, 959],
[951, 953, 954, 956],
[952, 955, 957, 959, 960],
[951, 953, 954, 956],
[952, 955, 957, 959, 960],
[953, 956],
[954],
[955, 960, 961],
[956],
[957, 959],
[958, 960, 961],
[959],
[960, 961],
[961],
[962, 964, 965, 966]]
Output:
[[941, 942, 943, 945, 949, 950, 951, 953, 956, 957, 959],
[943, 944, 945, 946, 947, 948, 949, 950, 951, 953, 954, 956],
[952, 955, 957, 959, 960],
[955, 960, 961],
[958, 960, 961],
[962, 964, 965, 966]]
My code does not work
CodePudding user response:
test_list = [[1, 0, -1], [-1, 0, 1], [-1, 0, 1], [1, 2, 3], [3, 4, 1]]
print("The original list : " str(test_list))
res = list(set(tuple(sorted(sub)) for sub in test_list))
print("The list after duplicate removal : " str(res))
CodePudding user response:
Assuming, L
the input list, that the sublists have unique items, that there is no duplicated sublists, and that order does not matter, you can first compute a dictionary of which sublists contain which item:
d = {}
for i,l in enumerate(L):
for v in l:
d.setdefault(v, set()).add(i)
# {941: {0},
# 942: {0, 1},
# 943: {0, 2},
# 945: {0, 1, 2, 3, 4},
# 949: {0, 2, 3, 4, 5, 6, 7, 8},
# ...
# }
Then for each list, compute the intersection of all sets, remove the current index and check whether the output is empty. If it is, then you have a duplicate/superset and should remove it.
out = [l for i,l in enumerate(L)
if not set.intersection(*(d[x] for x in l)).difference({i})]
output:
[[941, 942, 943, 945, 949, 950, 951, 953, 956, 957, 959],
[942, 945, 950, 952, 953, 956, 957, 959],
[943, 944, 945, 946, 947, 948, 949, 950, 951, 953, 954, 956],
[955, 960, 961],
[958, 960, 961],
[962, 964, 965, 966]]
alternative
Other approach to handle strict < comparison, and remove the duplicates. This is based on computing the sets of all list and comparing with the others by decreasing order of size (the complexity is not optimal):
S = list(map(frozenset, L))
S2 = sorted(S, key=len, reverse=True)
seen = set()
[l for l, s in zip(L, S)
if all(len(s)>len(s2) or not s<s2 for s2 in S2)
and not s in seen and not seen.add(s)
]
output:
[[941, 942, 943, 945, 949, 950, 951, 953, 956, 957, 959],
[942, 945, 950, 952, 953, 956, 957, 959],
[943, 944, 945, 946, 947, 948, 949, 950, 951, 953, 954, 956],
[952, 955, 957, 959, 960],
[955, 960, 961],
[958, 960, 961],
[962, 964, 965, 966]]