I state that I have never used Scrapy (and therefore I do not even know if it is the right tool).
On the website https://www.ufficiocamerale.it/, I am interested in entering an 11-digit numeric code (for example 06655971007) in the bar "INSERISCI LA PARTITA IVA/RAGIONE SOCIALE" and then click on "CERCA". Then I would like to save the resulting HTML in a variable that I would later analyze with BeautifulSoup (I shouldn't have any problems with that). So, how can I do the first part?
I imagine something like:
import scrapy
class Extraction(scrapy.Spider):
def start_requests(self):
url = "https://www.ufficiocamerale.it/"
# To enter data
yield scrapy.FormRequest(url=url, formdata={...}, callback=self.parse)
# To click the button
# some code
def parse(self, response):
print(response.body)
These are the HTML of the search bar and the button:
<input type="search" name="search_input" onchange="if (!window.__cfRLUnblockHandlers) return false; checkPartitaIva()" onkeyup="if (!window.__cfRLUnblockHandlers) return false; checkPartitaIva()" id="search_input" placeholder=" " value="">
<button onclick="if (!window.__cfRLUnblockHandlers) return false; dataLayer.push({'event': 'trova azienda'});" type="submit" >Cerca</button>
CodePudding user response:
It uses JavaScript to generate some elements so it would be simpler to use Selenium
from selenium import webdriver
import time
url = 'https://www.ufficiocamerale.it/'
driver = webdriver.Firefox()
driver.get(url)
time.sleep(5) # JavaScript needs time to load code
item = driver.find_element_by_xpath('//form[@id="formRicercaAzienda"]//input[@id="search_input"]')
#item = driver.find_element_by_id('search_input')
item.send_keys('06655971007')
time.sleep(1)
button = driver.find_element_by_xpath('//form[@id="formRicercaAzienda"]//p//button[@type="submit"]')
button.click()
time.sleep(5) # JavaScript needs time to load code
item = driver.find_element_by_tag_name('h1')
print(item.text)
print('---')
all_items = driver.find_elements_by_xpath('//ul[@id="first-group"]/li')
for item in all_items:
if '@' in item.text:
print(item.text, '<<< found email:', item.text.split(' ')[1])
else:
print(item.text)
print('---')
Result:
DATI DELLA SOCIETÀ - ENEL ENERGIA S.P.A.
---
Partita IVA: 06655971007 - Codice Fiscale: 06655971007
Rag. Sociale: ENEL ENERGIA S.P.A.
Indirizzo: VIALE REGINA MARGHERITA 125 - 00198 - ROMA
Rea: 1150724
PEC: [email protected] <<< found email: [email protected]
Fatturato: € 13.032.695.000,00 (2020)
ACQUISTA BILANCIO
Dipendenti : 1666 (2021)
---