Home > OS >  How to load each column of table separately with beautiful soup?
How to load each column of table separately with beautiful soup?

Time:04-23

How to load only each column separately with beautiful soup?

from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup as bs

url = "https://www.datasport.com/live/ranking/?racenr=21110&kat=96"
driver = webdriver.Chrome()
driver.get(url)
driver.implicitly_wait(10)
# goal: Click popup' away!!!!
driver.find_element(by=By.XPATH, value='/html/body/div[1]/div/a').click()
soup = bs(driver.page_source, "lxml")

A=[]

for i in soup.find_all('td'):
 A.append(i.get_text())

print(A)

driver.close()

the current output is the whole table. but I need all columns separately.

my goal is to have one list for each column of the table.

Xpath of first column, first row: //*[@id="tableResult1"]/tbody/tr[1]/td[1]


Xpath of forth column, first row: //*[@id="tableResult1"]/tbody/tr[1]/td[4]


Xpath of first column, third row: //*[@id="tableResult1"]/tbody/tr[3]/td[1]

with selenium I can use: driver.find_elements(by=By.XPATH, value=//*[@id="tableResult1"]/tbody/tr/td[1])) to load the first column. How can I do the same with bs? for example: soup.find_all('td'[1]) doesn't work.

CodePudding user response:

To get list for each column of the table, you can apply stripped_strings method.

from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup as bs

from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
data=[]
url = "https://www.datasport.com/live/ranking/?racenr=21110&kat=96"
driver.get(url)
driver.maximize_window()
driver.implicitly_wait(10)
# goal: Click popup' away!!!!
driver.find_element(By.XPATH, value='/html/body/div[1]/div/a').click()
soup = bs(driver.page_source, "lxml")
data=[]
for tr in soup.select('#tableResult1 tbody tr'):
    tr=list(tr.stripped_strings)
    data.append(tr[1])
print(data)

Output:

['Felfele Tesfaye', 'Matheka Bernard', 'Blackney Kevin', 'Wyndaele Romain', 'Oumoussa Zouhair', 'Fleury Julien', 'Allueva Miguel', 'Asfaw Tensai', 'Vieira Goncalves Paulo', 'Da Silva Miguel', 'Deletraz Arnaud', 'Boussague Lhoussein', 'Pinto Ricardo', 'Ané Jean-Pierre', 
'Mouzoun El Houssine', 'Planès Gaël', 'Marchal Dorian', 'Antoine Regis', 'Wyss Colin', 'Allan Harry', 'Thomet Vincent', 'Panozzo Piero', 'Gaudin Mikaël', 'Birchmeier Damien', 'Soleilhac Florent', 'Eicher Bernhard', 'Lachat Theo', 'Mackay Iain', 'Paccard Simon', 'Bernabeu Florian', 'Bosset Jérôme', 'Amarhoun Majid', 'Chaudoye Fabien', 'Romanens Jimmy', 'Bontaz Christophe', 'Marmillod Yves', 'Maffongelli Marco', 'Robas Gaspard', 'Matti Thibaut', 'Nuber Yoann', 'Afewerki Samuel', 'Zutter Jonathan', 'Rémy Cohann', 'Gebrezgabiher Aron', 'Tremblet Stephan', 'Ferrara Florent', 'Godbille Frédéric', 'Despréaux Julien', 'Casati Federico', 'Laurent Jerome', 'Gilliéron Christophe', 'Ferreira Paulo', 'Kopasz Arthur', 'Gabioud Romain', 'Chauvetet Paul', 'Wild Simon', 'Roubaty Quentin', 'Romelli Daniele', 'Charpigny Xavier', 'Hermand Dennis', 'Bouzon Daniel', 'Kowalski Adam', 'Secheyron Tom', 'Gottardo Alban', 'Faucheur Alexis', 'Pellegrino Vincenzo', 'Widmer Frédéric', 'Kellerhals Thomas', 'Duarte Micael', 'Reddy Akash Anant', 'Karlen Nathan', 'Marty Iwan', 'Ramer Roger', 'Simoncelli Michele', 'Albuquerque Bruno', 'Stübi Fabio', 'Niedergang Eric', 'Vaucher Gilles', 
'Mc Intyre Paul', 'Schwab Roman', 'Collaud Yohan', 'Kninech Bouchaib', 'Capt Léonard', 'Sauser Martin', 'Martin Frédéric', 'Gharbi Alex', 'Farrera-Soler Lluc', 'Bigler Nicolas', 'Simonet David', 'Socie Etienne', 'Garnier Noah', 'Laithier Dorian', 'Rothlisberger Bastien', 'Schindl David', 'Correvon Jari', 'Tschopp Benjamin', 'Duchet Guillaume', 'Baptiste Benoit', 'Soares Amilcar', 'Golay Simon']
  • Related