Home > Blockchain >  How use python to web scrape the filtered results (with selenium)?
How use python to web scrape the filtered results (with selenium)?

Time:04-08

I'm trying to scrape the filtered results from this website https://compranet.hacienda.gob.mx/esop/guest/go/public/opportunity/current?locale=es_MX.

First I aplied the filter "Código, descripción o referencia del Expediente", after this apears a new container and I select the option "Contiene" & finally I searched for a specific word (in this case is "anestesia"), however I don't know how to scrape the resulting table to get the links that appears in the secction of "Descripción del Expediente" from all the filtered results. I'm new using selenium and I'd like to get the filtered links, or know if there is other option to get the information i need.

This is my code:

import random
from time import sleep
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import requests
from lxml import html


s=Service('./chromedriver.exe')
driver = webdriver.Chrome(service=s)

driver.get('https://compranet.hacienda.gob.mx/esop/guest/go/public/opportunity/current? 
locale=es_MX')
sleep(5)
driver.find_element(By.XPATH ,"//*[@id='widget_filterPickerSelect']/div[1]/input").click()
sleep(5)
driver.find_element(By.XPATH,"//*[@id='filterPickerSelect_popup1']").click()
sleep(5)
driver.find_element(By.XPATH,"//*[@id='projectInfo_FILTER_OPERATOR_ID']/option[2]").click()
sleep(5)
busqueda = driver.find_element(By.XPATH,"//*[@id='projectInfo_FILTER']")
busqueda.send_keys("anestesia")
busqueda.send_keys(Keys.ENTER)

Especific this is want i want to scrape

<a href="#fh"  onclick="javascript:goToDetail('2110224', '01000');stopEventPropagation(event);" title="Ver detalle: PC-050GYR017-E140-2022    SERVICIO INTEGRAL DE ANESTESIA, PARA EL EJERCICIO  DEL 1º">PC-050GYR017-E140-2022   SERVICIO INTEGRAL DE ANESTESIA, PARA EL EJERCICIO  DEL 1º</a>

I need to get the link.

CodePudding user response:

You need to use explicit waits.

In order to get the links on the final page you should either use find_elements or visibility_of_all_elements_located since there are more than one web element present. In case you are only intended to scrape the link I'd say use only this line print(link.get_attribute('href')) and rest two of them you can comment.

Code:

s=Service('./chromedriver.exe')
driver = webdriver.Chrome(service=s)

driver.maximize_window()
wait = WebDriverWait(driver, 20)

driver.get('https://compranet.hacienda.gob.mx/esop/guest/go/public/opportunity/current?locale=es_MX')

wait.until(EC.element_to_be_clickable((By.XPATH, "//input[@value='▼ ']"))).click()
wait.until(EC.element_to_be_clickable((By.XPATH, "//div[@id='filterPickerSelect_popup1']"))).click()

select = Select(wait.until(EC.presence_of_element_located((By.ID, "projectInfo_FILTER_OPERATOR_ID"))))
select.select_by_value('CONTAINS')

busqueda = wait.until(EC.visibility_of_element_located((By.ID, "projectInfo_FILTER")))
busqueda.send_keys("anestesia")
time.sleep(2)
busqueda.send_keys(Keys.ENTER)

links = wait.until(EC.visibility_of_all_elements_located((By.XPATH, "//a[@class='detailLink'][@href]")))
for link in links:
    print(link.get_attribute('innerText'))
    print(link.get_attribute('href'))
    print(link.get_attribute('title'))

Imports:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

Output:

PC-050GYR017-E140-2022 SERVICIO INTEGRAL DE ANESTESIA, PARA EL EJERCICIO DEL 1º
https://compranet.hacienda.gob.mx/esop/toolkit/opportunity/current/list.si?reset=true&resetstored=true&userAct=changeLangIndex&language=es_MX&_ncp=1649225706261.4394-1#fh
Ver detalle: PC-050GYR017-E140-2022 SERVICIO INTEGRAL DE ANESTESIA, PARA EL EJERCICIO  DEL 1º
SERVICIO DE MANTENIMIENTO PREVENTIVO Y CORRECTIVO DE EQUIPO MÉDICO
https://compranet.hacienda.gob.mx/esop/toolkit/opportunity/current/list.si?reset=true&resetstored=true&userAct=changeLangIndex&language=es_MX&_ncp=1649225706261.4394-1#fh
Ver detalle: SERVICIO DE MANTENIMIENTO PREVENTIVO Y CORRECTIVO DE EQUIPO MÉDICO
  • Related