How do i replace each new line with a whitespace and replace 2 strings with a white space in python?-CodePudding

This is the answer that scrapes a particular section of an article on a website.

soup.find("div", {"id": "content_wrapper"}).text

I am supposed to replace each new line ('\n') in the body text with a whitespace (' '). I have done this with -soup.find("div", {"id": "content_wrapper"}).text.replace("\n", " ").strip()

But I still need to replace each of the '\xa0' and '\u200a' strings in the body text with a whitespace (' ') and Strip out all leading and trailing whitespaces.

How do I do this please?

Thank you!

CodePudding user response：

You just can add new replace methods after a replace method.

text = soup.find('div', {'id': 'content_wrapper'}).text
modified_text = text.replace('\n', ' ').replace('\xa0', ' ').replace('\u200a', ' ').strip()

If I understood correctly you want to remove these whitespaces too. Then, you shouldn't replace the words with whitespace " ". You should replace them with empty string, "".

text = soup.find('div', {'id': 'content_wrapper'}).text
modified_text = text.replace('\n', '').replace('\xa0', '').replace('\u200a', '').strip()

CodePudding user response：

all you need to do is check to see if it is in the text and write over it. like:

string = soup.find('div', {'id': 'content_wrapper'}).text
write = []
for i in string:
    if i.find('\\xa0') == 0: i = ''
    if i.find('\\u200a') == 0: i = ''
    write.append(i)