I want to input result in search field and get the Eircode/zipcode/postalcode from output page. e.g: https://eircode-finder.com/search/ and search list of addresses like: 8 old bawn court tallaght dublin and from results I want to fetch Eircode/zipcode/postalcode and save it in a .txt file I have used beautifulsoup to fetch data, but Its not fetching even the html of the page.I don't know details but something is on the website like javascript which is preventing me to get data from that website.
CodePudding user response:
You can use next example how to make a request to this page api:
import requests
import pandas as pd
url = "https://geocode.search.hereapi.com/v1/geocode"
to_search = "8 old bawn court tallaght dublin"
headers = {"Referer": "https://eircode-finder.com/"}
params = {
"q": to_search,
"lang": "en",
"in": "countryCode:IRL",
"apiKey": "BegLfP-EDdyWflI0fRrP3HJ7IDSK_0878_n2fbct1wE",
}
data = requests.get(url, params=params, headers=headers).json()
all_data = []
for i in data["items"]:
all_data.append([i["title"], i["address"]["postalCode"]])
df = pd.DataFrame(all_data, columns=["title", "postal_code"])
print(df.to_markdown(index=False))
Prints:
title | postal_code |
---|---|
8 Old Bawn Court, Dublin, County Dublin, D24 N1YH, Ireland | D24 N1YH |
CodePudding user response:
Mentioned website is developed using react
which requires javascript engine for rendering html pages.
Beautiful Soup
just sends the request and take the response if it's normal website response will be HTML or else it will be JSON data which require javascript engine to render.
Website which require javascript engine can be scrapped with selenium as it uses actual browser to request and load the page.
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
import pandas as pd
path = r"./chromedriver.exe"
driver = webdriver.Chrome(path)
url = "https://eircode-finder.com/search/"
driver.get(url)
search_input=driver.find_element_by_id("outlined")
search_input.send_keys("8 old bawn court tallaght dublin") # add text which you want to send
search_input.send_keys(Keys.ENTER)
time.sleep(10) # for page to load
eircode=driver.find_element_by_css_selector("#root > div:nth-child(2) > div > div.MuiBox-root.jss12 > div > div > div > div:nth-child(1) > div.MuiBox-root.jss13 > div > div > h3 > div")
print(eircode.text)
time.sleep(10) # buffer
# you can pass this page source to beautiful soup and scrap it
# or you can continue scrapping with selenium.
soup = BeautifulSoup(driver.page_source, 'html.parser')
print(driver.page_source)
You can check out this video which can help you understand better