I wonder if anyone could please help. I have a python list consisting of antibody names:
['anti-human CD86',
'anti-human CD274 (B7-H1, PD-L1)',
'anti-human CD270 (HVEM, TR2)',
...
'anti-human CD155 (PVR)',
'anti-human CD112 (Nectin-2)',
'anti-human CD47']
I want to remove the 'anti-human ' part so I just have a list of the actual protein targets e.g. [CD86, CD274 ... CD47].
I've tried multiple methods, including:
for i in parsed_protein_names:
i.split('anti-human ')
But don't seem to be getting anywhere. Could anyone please advise?
CodePudding user response:
A simple list comprehension with replace()
will do
>>> antibodies
['anti-human CD86', 'anti-human CD274 (B7-H1, PD-L1)', 'anti-human CD270 (HVEM, TR2)']
>>> [e.replace("anti-human ", "") for e in antibodies]
['CD86', 'CD274 (B7-H1, PD-L1)', 'CD270 (HVEM, TR2)']
CodePudding user response:
Assuming your list is defined as follows:
parsed_protein_names = ['anti-human CD86',
'anti-human CD274 (B7-H1, PD-L1)',
'anti-human CD270 (HVEM, TR2)',
'...',
'anti-human CD155 (PVR)',
'anti-human CD112 (Nectin-2)',
'anti-human CD47']
You have a few different options with a list
comprehension that you can use.
str.replace
result_list = [n.replace('anti-human ', '', 1) for n in parsed_protein_names]
print(result_list)
str.split
result_list = [n.split('anti-human', 1)[-1].lstrip() for n in parsed_protein_names]
print(result_list)
Here is the output, in any case:
['CD86', 'CD274 (B7-H1, PD-L1)', 'CD270 (HVEM, TR2)', '...', 'CD155 (PVR)', 'CD112 (Nectin-2)', 'CD47']
CodePudding user response:
the function you are looking for is "lstrip" and not "split"
here is a code that should be working
mylist = ['anti-human CD86','anti-human CD274 (B7-H1, PD-L1)','anti-human CD270 (HVEM, TR2)','anti-human CD155 (PVR)','anti-human CD112 (Nectin-2)','anti-human CD47']
my_output_list = []
for i in mylist:
a = i.lstrip('anti-human')
my_output_list.append(a)
print(my_output_list)
CodePudding user response:
If you know the length of the piece you want to remove, you can just use:
parsed_protein_names=[string[11:] for string in parsed_protein_names]
Otherwise, it will get complicated. Do notice that the following algorithm also will remove the CD
part.
minlen=len(sorted(parsed_protein_names,key=len)[0])
for x in range(minlen):
if len(set([string[x] for string in parsed_protein_names]))!=1:
break
parsed_protein_names=[string[x:] for string in parsed_protein_names]