How to remove big spaces in my scraped texts?-CodePudding

I am trying to remove big spaces from the code result:

from bs4 import BeautifulSoup
import requests


url = 'https://www.rucoyonline.com/characters/Something' 
response = requests.get(url)
print(response.status_code)

soup = BeautifulSoup(response.text, 'html.parser')

table = soup.find('table', class_ = 'character-table table table-bordered')
print(table.get_text())

Result after running code :

Character Information




Name
Something


Level
28


Last online

                    about 6 years ago



Born
September 03, 2016

string() is not working, I think it's because beautifulsoup

CodePudding user response：

One line answer:

print("\n".join([s for s in table.get_text().split("\n") if s]))

Output:

Character Information
Name
Something
Level
28
Last online
                    about 6 years ago
Born
September 03, 2016

And to remove trailing and leading spaces

print("\n".join([s.strip() for s in table.get_text().split("\n") if s]))

Output:

Character Information
Name
Something
Level
28
Last online
about 6 years ago
Born
September 03, 2016

CodePudding user response：

Since you are using BeautifulSoup. You can do this,

table_values = [item.text.strip() for item in table.find_all('tr')]
for item in table_values:
    print(item.replace('\n', ''))

Output

Character Information
NameSomething
Level28
Last online                    about 6 years ago
BornSeptember 03, 2016

CodePudding user response：

There is no need of regex or join() of list comprehension results - Simply use standard parameters of get_text():

table.get_text('\n',strip=True)

Example

from bs4 import BeautifulSoup
import requests

url = 'https://www.rucoyonline.com/characters/Something' 
response = requests.get(url)
print(response.status_code)

soup = BeautifulSoup(response.text, 'html.parser')

table = soup.find('table', class_ = 'character-table table table-bordered')
print(table.get_text('\n',strip=True))

Output

Character Information
Name
Something
Level
28
Last online
about 6 years ago
Born
September 03, 2016