How to remove characters from a list in python-CodePudding

I have created a list of sequence names and sequences from a fasta file. Does anybody know how I can remove the '>' character from the sequence names list? I have tried using strip, replace, map. The list provides the following output:

>chrI
>chrII
>chrIII

where it should be:

chrI
chrII
chrIII

fp = open(r'demo_fasta_file_2022.fas', 'r')
def read_fasta(fp):
    sequence_names, sequences = None, []
    for line in fp:
        line = line.rstrip()
        if line.startswith(">"):
            if sequence_names: yield (sequence_names, ''.join(sequences))
            sequence_names, sequences = line, []
        else:
            sequences.append(line)
    if sequence_names: yield (sequence_names, ''.join(sequences))

with open('demo_fasta_file_2022.fas') as fp:
    for sequence_names, sequences in read_fasta(fp):
        print(sequence_names)

CodePudding user response：

this process is called String Slicing. There are a lot of ways to do it. This might help: https://www.w3schools.com/python/gloss_python_string_slice.asp

CodePudding user response：

Just slice:

print(line[1:])

If you are unsure of the presence of '>', use:

if line.startswith(">"):
    print(line[1:])
else:
    print(line)

CodePudding user response：

You can also use a regex, which is a little bit safer than line[1:]

import re
# ... 
line = re.sub(r'^>', '', line, flags=re.MULTILINE)

Where ^ is a sign for the start of the line and the function signature is re.sub(REGEX, REPLACE_WITH, INPUTSTRING).

re.MULTILINE allows you to use ^ and $ for start/end of lines.