Home > Software design >  Insert a code, click on a button and extract the result with Scrapy
Insert a code, click on a button and extract the result with Scrapy

Time:02-28

I state that I have never used Scrapy (and therefore I do not even know if it is the right tool).

On the website https://www.ufficiocamerale.it/, I am interested in entering an 11-digit numeric code (for example 06655971007) in the bar "INSERISCI LA PARTITA IVA/RAGIONE SOCIALE" and then click on "CERCA". Then I would like to save the resulting HTML in a variable that I would later analyze with BeautifulSoup (I shouldn't have any problems with that). So, how can I do the first part?

I imagine something like:

import scrapy

class Extraction(scrapy.Spider):

    def start_requests(self):
        url = "https://www.ufficiocamerale.it/"
        # To enter data
        yield scrapy.FormRequest(url=url, formdata={...}, callback=self.parse)
        # To click the button
        # some code

    def parse(self, response):
        print(response.body)

These are the HTML of the search bar and the button:

<input type="search" name="search_input"  onchange="if (!window.__cfRLUnblockHandlers) return false; checkPartitaIva()" onkeyup="if (!window.__cfRLUnblockHandlers) return false; checkPartitaIva()" id="search_input" placeholder=" " value="">

<button onclick="if (!window.__cfRLUnblockHandlers) return false; dataLayer.push({'event': 'trova azienda'});" type="submit" >Cerca</button>

CodePudding user response:

It uses JavaScript to generate some elements so it would be simpler to use Selenium

from selenium import webdriver
import time

url =  'https://www.ufficiocamerale.it/'

driver = webdriver.Firefox()
driver.get(url)

time.sleep(5)  # JavaScript needs time to load code

item = driver.find_element_by_xpath('//form[@id="formRicercaAzienda"]//input[@id="search_input"]')
#item = driver.find_element_by_id('search_input')
item.send_keys('06655971007')

time.sleep(1)

button = driver.find_element_by_xpath('//form[@id="formRicercaAzienda"]//p//button[@type="submit"]')
button.click()

time.sleep(5)  # JavaScript needs time to load code

item = driver.find_element_by_tag_name('h1')
print(item.text)
print('---')

all_items = driver.find_elements_by_xpath('//ul[@id="first-group"]/li')
for item in all_items:
    if '@' in item.text:
        print(item.text, '<<< found email:', item.text.split(' ')[1])
    else:
        print(item.text)
print('---')

Result:

DATI DELLA SOCIETÀ - ENEL ENERGIA S.P.A.
---
Partita IVA: 06655971007 - Codice Fiscale: 06655971007
Rag. Sociale: ENEL ENERGIA S.P.A.
Indirizzo: VIALE REGINA MARGHERITA 125 - 00198 - ROMA
Rea: 1150724
PEC: [email protected] <<< found email: [email protected]
Fatturato: € 13.032.695.000,00 (2020)
ACQUISTA BILANCIO
Dipendenti : 1666 (2021)
---
  • Related