good morning.
I am quite new to scrapping data with selenium and am facing one difficulty to gather the data from this website.
https://www.puertodeveracruz.com.mx/datosBuques/principal.php?jmlp=1
What I would like to retrive are all rows for these 3 columns:
Viaje
,Nombre Buque
,Fecha ETA
I tried to get it by using driver.findelements
but I am not sure what elemt should I use, tried id="gridSimpleFiltering_footer_container"
but it seems not to work.
What would be the solution?
Thank you in advance!
CodePudding user response:
# requires installing of bs4
from bs4 import BeautifulSoup
pageSource = driver.page_source
soup = BeautifulSoup(pageSource, 'html.parser')
# retrieves first of two tbody which contains all the data you seek
tbody = soup.find('tbody')
tr = tbody.find_all('tr')
Viaje = []
NombreBuque = []
FechaETA = []
for t in tr:
# for each row find all the td cells
td = t.find_all('td')
#Viaje is the first column which appears in the 0th td cell and so on...
Viaje.append(td[0])
NombreBuque.append(td[1])
FechaETA.append(td[2])
CodePudding user response:
Or none selenium solution:
import requests
import pandas as pd
df = pd.DataFrame(eval(requests.get('https://www.puertodeveracruz.com.mx/ws/BuquesProgramados').text.replace('\/', '/')))
print(df.to_string(columns=['VID', 'NOM_BUQUE', 'F_ETA']))
OUTPUT:
VID NOM_BUQUE F_ETA
0 221680 VEGA VELA 30/09/2022
1 221704 HALLEY 29/09/2022
2 221666 NORDIC MASA 26/09/2022
3 221553 LUTETIA 25/09/2022
4 221709 MOUNT ATHOS 25/09/2022
5 221536 ORINOCO 24/09/2022
6 221703 MARGARETE SCHULTE 24/09/2022
7 221712 COLUMBIA HIGHWAY 23/09/2022
8 221622 MSC DON GIOVANNI 23/09/2022
9 221662 CONTSHIP LEO 23/09/2022
10 221676 MONTE PASCOAL 23/09/2022
11 221665 GINGA PUMA 22/09/2022
12 221674 AS PETRONIA 22/09/2022
13 221691 BBC SCANDINAVIA 22/09/2022
14 221715 BROOKLYN BRIDGE 22/09/2022
15 221630 MSC EMDEN III 22/09/2022
16 221711 ATLANTIC MONTERREY 21/09/2022
17 221694 VICTORIA HIGHWAY 21/09/2022
18 221708 CORONA J 21/09/2022
19 221636 PRESIDIO 21/09/2022
20 221629 MSC AQUARIUS 21/09/2022
21 221673 MONTE TAMARO 20/09/2022
22 221710 ATLANTIC DREAM 20/09/2022
23 221542 BRAVERY ACE 20/09/2022
24 221541 ADRIA ACE 20/09/2022
25 221702 SEAFRONTIER 20/09/2022
26 221701 VOKARIA 20/09/2022
27 221618 ATLANTIK PRIDE 20/09/2022
28 221627 MSC DARIEN 20/09/2022
29 221684 STOLT HALCON 20/09/2022
30 221714 CERRO AZUL 20/09/2022
31 221713 JMC 3080 20/09/2022
32 221628 GENOVA 19/09/2022
33 221705 STANLEY PARK 19/09/2022
34 221667 ORIENTAL MARGUERITE 19/09/2022
35 221707 MAIRA 19/09/2022
36 221716 DEE4 FIG 19/09/2022
37 221692 LONGVIEW LOGGER 18/09/2022
38 221698 ATLANTIC STAR 18/09/2022
39 221479 GRANDE TORINO 18/09/2022
40 221693 PIS PARAGON 18/09/2022
...
CodePudding user response:
You need find the rows first and then columns to fetch the value. There is page sync issue, you need to handle that as well using webdriverwait()
driver.get("https://www.puertodeveracruz.com.mx/datosBuques/principal.php?jmlp=1")
WebDriverWait(driver,10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"table#gridSimpleFiltering>tbody>tr")))
tableRows=driver.find_elements(By.CSS_SELECTOR, "table#gridSimpleFiltering>tbody>tr")
for row in tableRows:
print(row.find_element(By.XPATH, ".//td[1]").text)
print(row.find_element(By.XPATH, ".//td[2]").text)
print(row.find_element(By.XPATH, ".//td[3]").text)
print("====================================")
You need to import below libraries
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
Output:
221680
VEGA VELA
30/09/2022
====================================
221704
HALLEY
29/09/2022
====================================
221666
NORDIC MASA
26/09/2022
====================================
221553
LUTETIA
25/09/2022
====================================
221709
MOUNT ATHOS
25/09/2022
====================================
221536
ORINOCO
24/09/2022
====================================
221703
MARGARETE SCHULTE
24/09/2022
====================================
221712
COLUMBIA HIGHWAY
23/09/2022
====================================
221622
MSC DON GIOVANNI
23/09/2022
====================================
221662
CONTSHIP LEO
23/09/2022
====================================
221676
MONTE PASCOAL
23/09/2022
====================================
221665
GINGA PUMA
22/09/2022
====================================
221674
AS PETRONIA
22/09/2022
====================================
221691
BBC SCANDINAVIA
22/09/2022
====================================
221715
BROOKLYN BRIDGE
22/09/2022
====================================
221630
MSC EMDEN III
22/09/2022
====================================
221711
ATLANTIC MONTERREY
21/09/2022
====================================
221694
VICTORIA HIGHWAY
21/09/2022
====================================
221708
CORONA J
21/09/2022
====================================
221636
PRESIDIO
21/09/2022
====================================
221629
MSC AQUARIUS
21/09/2022
====================================