Home > Enterprise >  Find duplicates in an array and find their values (deep search) with Python
Find duplicates in an array and find their values (deep search) with Python

Time:03-20

I have an array of lines, with each line being represented by:

{
  'ms': int,
  'e_up': bool,
  'e_down': bool,
  'f_up': bool,
  'f_down': bool,
  'l_up': bool,
  'l_down': bool,
  'r_up': bool,
  'r_down': bool,
  'b': int,
  'a': int,
  'c': int,
  'd': int
}

I want to loop through all lines (an array of lines, as a dictionary) and find all duplicates and their .ms property.

For example, if I have:

(1902, False, False, False, False, False, False, True, False, 128, -37, -127, -20)

(1843, False, False, False, False, True, False, False, False, 0, 13, -13, 0)

(1968, False, False, False, False, True, False, False, False, 0, 13, -13, 0)

(234, False, False, False, False, True, False, False, False, 0, 13, -13, 0)

(0, False, False, False, False, True, False, False, False, 0, 13, -13, 0)

I want the output to be:

[
  [
    1843,
    1968,
    234,
    0
  ]
]

I want to find all possible combinations, and time isn't an issue here, if it took the extra time it wouldn't really matter to me. How can I accomplish this with Python? (No external libraries please)

CodePudding user response:

You can take advantage of the fact the tuples can be used as keys in a dictionary. The following code uses the tuple of values other than 'ms' as a key in a dictionary, and the 'ms' values are saved as a list in the dictionary. Any list with 2 values or more indicates duplicates:

itemlist = list()
itemlist.append((1902, False, False, False, False, False, False, True, False, 128, -37, -127, -20))
itemlist.append((1843, False, False, False, False, True, False, False, False, 0, 13, -13, 0))
itemlist.append((1968, False, False, False, False, True, False, False, False, 0, 13, -13, 0))
itemlist.append((234, False, False, False, False, True, False, False, False, 0, 13, -13, 0))
itemlist.append((0, False, False, False, False, True, False, False, False, 0, 13, -13, 0))

itemdict = dict()
# create dictionary with lists of items according to signature
for item in itemlist:
    key = item[1:]
    if key in itemdict:
        itemdict[key].append(item[0])
    else:
        itemdict[key] = [item[0]]

# iterate over dictionary and find items with more than one occurence
duplicates = []
for value in itemdict.values():
    if len(value)>1:
        duplicates.extend(value)

print(duplicates)

CodePudding user response:

The way I solved the problem was by searching each index with every other non-checked index of the array and finding the duplicates.

def find_duplicate(lines, line, duplicates, checked):
    if (line['ms'] in checked):
        return duplicates, checked

    duplicate = list()
    duplicate.append(line['ms'])

    checked.append(line['ms'])
    for i in range(len(lines)):
        new_line = lines[i]
        if new_line['ms'] in checked: continue
        if new_line['e_up'] == line['e_up'] and new_line['e_down'] == line['e_down'] and new_line['f_up'] == line['f_up'] and new_line['f_down'] == line['f_down'] and new_line['l_up'] == line['l_up'] and new_line['l_down'] == line['l_down'] and new_line['r_up'] == line['r_up'] and new_line['r_down'] == line['r_down'] and new_line['b'] == line['b'] and new_line['a'] == line['a'] and new_line['c'] == line['c'] and new_line['d'] == line['d']:
            duplicate.append(new_line['ms'])
            checked.append(new_line['ms'])

    duplicates.append(duplicate)

    return duplicates, checked

And then I used the above function on every non-checked index of the potential duplicate (lines) array.

duplicates = list()
checked = list()

for i in range(len(lines)):
    duplicates, checked = find_duplicate(lines, lines[i], duplicates, checked)

print(duplicates)

Input to the code:

(1902, False, False, False, False, False, False, True, False, 128, -37, -127, -20)
(1843, False, False, False, False, True, False, False, False, 0, 13, -13, 0)
(1932, False, False, False, False, True, False, False, False, 0, 13, -13, 0)
(1847, False, True, False, False, True, False, False, False, 0, 13, -13, 0)
(1869, False, True, False, False, True, False, False, False, 0, 13, -13, 0)

Output: [[1902], [1843, 1932], [1847, 1869]]

  • Related