I have the following list:
poly = ['rs1', 'rs1', 'rs2', 'rs2', 'rs3', 'rs3', 'rs4', 'rs4', 'rs5', 'rs5']
All I need is to add a underscore to duplicate values, in order to get something like this:
['rs1', 'rs1_', 'rs2', 'rs2_', 'rs3', 'rs3_', 'rs4', 'rs4_', 'rs5', 'rs5_']
I've checked some similiar public questions but only menage to add a number after each value (therefore getting something like:
['rs11', 'rs12', 'rs21', 'rs22', 'rs31', 'rs32', 'rs41', 'rs42', 'rs51', 'rs52']
which is definitely far from my aim).
Any ideas?
Thank you.
CodePudding user response:
By looping through your poly
list and keeping track of elements you've seen before, you can identify and add an _
to each duplicate in the original list:
poly = ['rs1', 'rs1', 'rs2', 'rs2', 'rs3', 'rs3', 'rs4', 'rs4', 'rs5', 'rs5']
already_encountered = []
for x in range(len(poly)):
if poly[x] in already_encountered:
poly[x] = f'{poly[x]}_'
else:
already_encountered.append(poly[x])
print(poly)
Output:
['rs1', 'rs1_', 'rs2', 'rs2_', 'rs3', 'rs3_', 'rs4', 'rs4_', 'rs5', 'rs5_']
CodePudding user response:
One-liner for your decoding pleasure :)
[ poly[i] '_' if poly[i] in poly[:i] else poly[i] for i in range(len(poly)) ]
CodePudding user response:
A more ideomatic version of PangolinPaws answer would be:
poly = ['rs1', 'rs1', 'rs2', 'rs2', 'rs3', 'rs3', 'rs4', 'rs4', 'rs5', 'rs5']
seen = set()
for idx, v in enumerate(poly):
if v in seen:
poly[idx] = f'{v}_'
seen.add(v) # sets are unique by nature - no need to put it in an else
print(poly)
with the same overall outcome but more performant:
- usage of
set()
makes the "in" checks O(1) - usage of
enumerate(...)
removes the need of indexing into the list multiple times