Home > Enterprise >  How to remove duplicates from a list with respect to another list?
How to remove duplicates from a list with respect to another list?

Time:07-20

I have 2 lists:

a=[1,1,1,1,2,2,2,2,3,3,3,3,3,3,4,4,5,5,5,5,5]

b=['v1','v2','v2','',
   'v1','v2','v2','',
   'v1','v2','v3','v3','v3','v3',
    'v1','v2',
    'v1','v2','v2','v2','']

Both lists have same number of elements. I want to remove duplicates in list b with respect to list a that contains group of elements. For example: List a contains a=[1,1,1,1,2,2,2,2] then with respect to elements of list a which contains groups I want to remove duplicates in list b.

List b = ['v1','v2,'v2','','v1','v2','v2','']. According to indexes of both lists, list b containing first 4 elements(because a has [1,1,1,1]) has a duplicate v2.

And next 4 elements in list b (because a has [2,2,2,2])according to indexes of list a has 4 elements that has two duplicates v2.

I want to replace the following duplicate(v2) or (v3) if there are more than one v2/v3 by ''( empty string) and So output for b should look something like this:

b=['v1','v2','','',
   'v1','v2','','',]

Similar pattern expected for further duplicates like v3.

Expected Output

b=['v1','v2','','',
   'v1','v2','','',
   'v1','v2','v3','','','',
    'v1','v2',
    'v1','v2','','','']

I want to make changes in list b with respect to group of elements in list a. So suggest any approach if you could. Maybe 2 dimensional list b with respect to list a and then solving the problem?

CodePudding user response:

Try:

from itertools import groupby

out = []
for _, g in groupby(zip(a, b), lambda k: k[0]):
    seen = set()
    for _, v in g:
        if v not in seen:
            out.append(v)
            seen.add(v)
        else:
            out.append("")

print(out)

Prints:

[
    "v1","v2","","",
    "v1","v2","","",
    "v1","v2","v3","","","",
    "v1","v2",
    "v1","v2","","","",
]

CodePudding user response:

a=[1,1,1,1,2,2,2,2,3,3,3,3,3,3,4,4,5,5,5,5,5]

b=['v1','v2','v2','',
   'v1','v2','v2','',
   'v1','v2','v3','v3','v3','v3',
    'v1','v2',
    'v1','v2','v2','v2','']

hashmap = {1:{}, 2:{}, 3:{}, 4:{}, 5:{}}

for entry in enumerate(b):
    position = entry[0]
    value = entry[1]
    row = a[position]
    if value in hashmap[row]:
        b[position]=''
    else:
        hashmap[row][value] = True

CodePudding user response:

Another attempt using lists:

def group(lst):
    if not lst:
        return []
    out, last, tmp = [], lst[0], [lst[0]]
    for k in lst[1:]:
        if last == k:
            if k in tmp:
                tmp.append((k[0], ''))
            else:
                tmp.append(k)
        else:
            out.append(tmp)
            tmp = [k]
            last = k
    if tmp:
       out.append(tmp)
    return out


a = [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 5, 5, 5, 5, 5]

b = [
    'v1', 'v2', 'v2','',
    'v1', 'v2', 'v2','',
    'v1', 'v2', 'v3', 'v3', 'v3', 'v3',
    'v1', 'v2',
    'v1', 'v2', 'v2', 'v2', ''
 ]

groupped = group(list(zip(a, b)))
final =  [j for k in groupped for _,j in k]
print(final)

Output:

[
    'v1', 'v2', '', '',
    'v1', 'v2', '', '',
    'v1', 'v2', 'v3', '', '', '',
    'v1', 'v2',
    'v1', 'v2', '', '', ''
]

Edit: A variant of the first attempt that resolves the problem in a single loop (which is much faster):

def group(lst):
    if not lst:
        return []
    last, tmp = lst[0], [lst[0]]
    tmp2, final = [lst[0][1]], []
    for k in lst[1:]:
        if last == k:
            if k in tmp:
                tmp.append((k[0], ''))
                tmp2.append('')
            else:
                tmp.append(k)
                tmp2.append(k[1])
        else:
            final.extend(tmp2)
            tmp = [k]
            tmp2 = [k[1]]
            last = k
    if tmp2:
       final.extend(tmp2)
    return final

final =  group(list(zip(a, b)))
print(final)

Output:

[
    'v1', 'v2', '', '',
    'v1', 'v2', '', '',
    'v1', 'v2', 'v3', '', '', '',
    'v1', 'v2',
    'v1', 'v2', '', '', ''
]

CodePudding user response:

Another way with slicing, set and sorting:

i = 1
c, d = [], []
while a.count(i) > 0:
    c = sorted(set(b[a.index(i):a.index(i)   a.count(i)]))
    if '' in c:
        c.remove('')
    c = list(c)   [''] * (a.count(i) - len(c))
    d  = c
    i  = 1
b = d

Output, b =

['v1', 'v2', '', '', 
 'v1', 'v2', '', '', 
 'v1', 'v2', 'v3', '', '', '', 
 'v1', 'v2', 
 'v1', 'v2', '', '', '']
  • Related