I have 2 lists:
a=[1,1,1,1,2,2,2,2,3,3,3,3,3,3,4,4,5,5,5,5,5]
b=['v1','v2','v2','',
'v1','v2','v2','',
'v1','v2','v3','v3','v3','v3',
'v1','v2',
'v1','v2','v2','v2','']
Both lists have same number of elements. I want to remove duplicates in list b with respect to list a that contains group of elements. For example: List a contains a=[1,1,1,1,2,2,2,2] then with respect to elements of list a which contains groups I want to remove duplicates in list b.
List b = ['v1','v2,'v2','','v1','v2','v2','']. According to indexes of both lists, list b containing first 4 elements(because a has [1,1,1,1]) has a duplicate v2.
And next 4 elements in list b (because a has [2,2,2,2])according to indexes of list a has 4 elements that has two duplicates v2.
I want to replace the following duplicate(v2) or (v3) if there are more than one v2/v3 by ''( empty string) and So output for b should look something like this:
b=['v1','v2','','',
'v1','v2','','',]
Similar pattern expected for further duplicates like v3.
Expected Output
b=['v1','v2','','',
'v1','v2','','',
'v1','v2','v3','','','',
'v1','v2',
'v1','v2','','','']
I want to make changes in list b with respect to group of elements in list a. So suggest any approach if you could. Maybe 2 dimensional list b with respect to list a and then solving the problem?
CodePudding user response:
Try:
from itertools import groupby
out = []
for _, g in groupby(zip(a, b), lambda k: k[0]):
seen = set()
for _, v in g:
if v not in seen:
out.append(v)
seen.add(v)
else:
out.append("")
print(out)
Prints:
[
"v1","v2","","",
"v1","v2","","",
"v1","v2","v3","","","",
"v1","v2",
"v1","v2","","","",
]
CodePudding user response:
a=[1,1,1,1,2,2,2,2,3,3,3,3,3,3,4,4,5,5,5,5,5]
b=['v1','v2','v2','',
'v1','v2','v2','',
'v1','v2','v3','v3','v3','v3',
'v1','v2',
'v1','v2','v2','v2','']
hashmap = {1:{}, 2:{}, 3:{}, 4:{}, 5:{}}
for entry in enumerate(b):
position = entry[0]
value = entry[1]
row = a[position]
if value in hashmap[row]:
b[position]=''
else:
hashmap[row][value] = True
CodePudding user response:
Another attempt using lists
:
def group(lst):
if not lst:
return []
out, last, tmp = [], lst[0], [lst[0]]
for k in lst[1:]:
if last == k:
if k in tmp:
tmp.append((k[0], ''))
else:
tmp.append(k)
else:
out.append(tmp)
tmp = [k]
last = k
if tmp:
out.append(tmp)
return out
a = [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 5, 5, 5, 5, 5]
b = [
'v1', 'v2', 'v2','',
'v1', 'v2', 'v2','',
'v1', 'v2', 'v3', 'v3', 'v3', 'v3',
'v1', 'v2',
'v1', 'v2', 'v2', 'v2', ''
]
groupped = group(list(zip(a, b)))
final = [j for k in groupped for _,j in k]
print(final)
Output:
[
'v1', 'v2', '', '',
'v1', 'v2', '', '',
'v1', 'v2', 'v3', '', '', '',
'v1', 'v2',
'v1', 'v2', '', '', ''
]
Edit: A variant of the first attempt that resolves the problem in a single loop (which is much faster):
def group(lst):
if not lst:
return []
last, tmp = lst[0], [lst[0]]
tmp2, final = [lst[0][1]], []
for k in lst[1:]:
if last == k:
if k in tmp:
tmp.append((k[0], ''))
tmp2.append('')
else:
tmp.append(k)
tmp2.append(k[1])
else:
final.extend(tmp2)
tmp = [k]
tmp2 = [k[1]]
last = k
if tmp2:
final.extend(tmp2)
return final
final = group(list(zip(a, b)))
print(final)
Output:
[
'v1', 'v2', '', '',
'v1', 'v2', '', '',
'v1', 'v2', 'v3', '', '', '',
'v1', 'v2',
'v1', 'v2', '', '', ''
]
CodePudding user response:
Another way with slicing, set and sorting:
i = 1
c, d = [], []
while a.count(i) > 0:
c = sorted(set(b[a.index(i):a.index(i) a.count(i)]))
if '' in c:
c.remove('')
c = list(c) [''] * (a.count(i) - len(c))
d = c
i = 1
b = d
Output, b =
['v1', 'v2', '', '',
'v1', 'v2', '', '',
'v1', 'v2', 'v3', '', '', '',
'v1', 'v2',
'v1', 'v2', '', '', '']