how to remove \t\n\r in list python doing web Scraping?-CodePudding

how to remove \n\r\t in ['Company Name', 'Headquarters Location', 'Company Type\n\r\n\t\t\t\t\t\t\t\t?', 'Fleet Size']

for i in head:
    c = i.text.strip()
    a.append(c)
    print(a)```

*Output**
['Company Name',
 'Headquarters Location',
 'Company Type\n\r\n\t\t\t\t\t\t\t\t?',
 'Fleet Size']

CodePudding user response：

Please don't flam my variable names

a=['Company Name',
 'Headquarters Location',
 'Company Type\n\r\n\t\t\t\t\t\t\t\t?',
 'Fleet Size'] 

b = []
unwanted = ["\n","\t","\r"]

for i in a:
    to_add = ""
    for char in i:
        if char not in unwanted:
            to_add  = char
    b.append(to_add)

print(b)

CodePudding user response：

I think regex module would be effective.

import re

for i in head:
    c = i.text.strip()
    # ==== regex substitution ====
    c = re.sub(r'[\r\n\t]', '', c, flags=re.MULTILINE)
    a.append(c)
    print(a)

*Output**
['Company Name',
 'Headquarters Location',
 'Company Type?',
 'Fleet Size']

CodePudding user response：

Using re.sub:

import re

lst = ['Company Name', 'Headquarters Location', 'Company Type\n\r\n\t\t\t\t\t\t\t\t?', 'Fleet Size']

output = [re.sub(r'[\r\n\t]', '', x) for x in lst]
print(output) # ['Company Name', 'Headquarters Location', 'Company Type?', 'Fleet Size']