Home > Software design >  How do you scrape a table from a website which is hosting the table data outside of the HTML?
How do you scrape a table from a website which is hosting the table data outside of the HTML?

Time:04-03

I am trying to scrape the table data from this table URL: https://covid19criticalcare.com/pharmacies/ enter image description here

On my previous scrape I used the following Python packages: from bs4 import BeautifulSoup import requests import mysql.connector import pandas as pd from sqlalchemy import create_engine

But this url's HTML doesn't contain the table data on it, instead it seems to be drawing the data from an external database. enter image description here

Could someone please point me in the right direction for scraping a table data with this sort of HTML setup using a python script?

I tried doing a blind scrape, by using the method I used on my previous scrape.

from bs4 import BeautifulSoup
import requests
import mysql.connector
import pandas as pd
from sqlalchemy import create_engine

url = "https://covid19criticalcare.com/pharmacies/"

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
result = requests.get(url, headers = headers)
doc = BeautifulSoup(result.text, "html.parser")

name = doc.find_all("td", class\_="column-1")

td_pharmacy_name = \[\]

for td in name:
names = td.text
td_names.append(names)
print(td_names)

CodePudding user response:

The content of what you are trying to scrape is available when the Javascript on the website gets rendered. The simplest way for this is to either mock the request using the same Rest API method or use a library that helps rendered the content; for instance, Selenium, Scrapy, etc.

For more details on how to scrape JS-rendered content, you can check out this thread Web-scraping JavaScript page with Python

For basic troubleshooting on how you can view the request and response, you can open up the Chrome Developer Tool by right click on the HTML page > click on "inspect" > click on "Network" tab > click on "Fetch/XHR" > Press "command Shift R" to reload your page once.

If you are unsure which request contains the data you are looking for, you can use command F to search and type in the keyword, and Chrome will list out the requests that match your searches

This image shows that the data is sent using AJAX and it also depicts the result of the steps above

CodePudding user response:

Just as alternative to @Naphat Theerawats answer and while I noticed that you started with a seleniumbased solution you could get your goal with that much easier in combination withpandas`.

Load the website and extract table from driver.page_source with pd.read_html() - To avoid iterating each page just select Show All entries

Example

from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
import pandas as pd

url = 'https://covid19criticalcare.com/pharmacies/'

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
driver.get(url)
wait = WebDriverWait(driver, 5)
        
select = Select(wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '[name = "DataTables_Table_0_length"'))))
select.select_by_value('-1')
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'a.paginate_button.next.disabled')))

df = pd.read_html(driver.page_source, displayed_only=False)[1]
driver.close()

df

Output

Pharmacy Name Email Phone Website Requires prescription? Pharmacy Address Based in the United States? Overnight shipping to the United States? Overnight International shipping? Ships to the following States/Provinces
0 Covid Pharmacy [email protected] (785) 672 9222 0covidpharmacy.com NO 245 Krishna Market Channi RoadNagpur, Maharashtra 440001India NO YES YES AlabamaAlaskaArizonaArkansasCaliforniaColoradoConnecticutDelawareDistrict of ColumbiaFloridaGeorgiaHawaiiIdahoIllinoisIndianaIowaKansasKentuckyLouisianaMaineMarylandMassachusettsMichiganMinnesotaMississippiMissouriMontanaNebraskaNevadaNew HampshireNew JerseyNew MexicoNew YorkNorth CarolinaNorth DakotaOhioOklahomaOregonPennsylvaniaRhode IslandSouth CarolinaSouth DakotaTennesseeTexasUtahVermontVirginiaWashingtonWest VirginiaWisconsinWyomingGuamPuerto RicoVirgin IslandsArmed Forces AmericasArmed Forces EuropeArmed Forces Pacific
1 Ivermectin Service [email protected] (888) 290 0964 (US), 91 22509 72606 (IN) 1ivermectin.com NO 1/16, First Floor, Tardeo Air Conditioned Market Building, TardeoMumbai, Tardeo 400034India NO YES YES AlabamaAlaskaArizonaArkansasCaliforniaColoradoConnecticutDelawareDistrict of ColumbiaFloridaGeorgiaIdahoIllinoisIndianaIowaKansasKentuckyLouisianaMaineMarylandMassachusettsMichiganMinnesotaMississippiMissouriMontanaNebraskaNevadaNew HampshireNew JerseyNew MexicoNew YorkNorth CarolinaNorth DakotaOhioOklahomaOregonPennsylvaniaRhode IslandSouth CarolinaSouth DakotaTennesseeTexasUtahVermontVirginiaWashingtonWest VirginiaWisconsinWyomingPuerto RicoVirgin Islands
1 Life Pharmacy [email protected] (888) 560-0430 (US); 91 (807 ) 127-9990 (India) 1lifepharmacy.net NO 302, Pride Plaza, Rajkot, 360002Rajkot, Gujarat 360002; 84118India NO YES YES AlabamaAlaskaArizonaArkansasCaliforniaColoradoConnecticutDelawareDistrict of ColumbiaFloridaGeorgiaHawaiiIdahoIllinoisIndianaIowaKansasKentuckyLouisianaMaineMarylandMassachusettsMichiganMinnesotaMississippiMissouriMontanaNebraskaNevadaNew HampshireNew JerseyNew MexicoNew YorkNorth CarolinaNorth DakotaOhioOklahomaOregonPennsylvaniaRhode IslandSouth CarolinaSouth DakotaTennesseeTexasUtahVermontVirginiaWashingtonWest VirginiaWisconsinWyoming
1-2-3 RX Global Pharmacy [email protected] (516) 758-2630 123rx.net NO 2967 Dundas St. W.Toronto, Ontario M6P 1Z2Canada NO YES YES AlabamaAlaskaArizonaArkansasCaliforniaColoradoConnecticutDelawareDistrict of ColumbiaFloridaGeorgiaHawaiiIdahoIllinoisIndianaIowaKansasKentuckyLouisianaMaineMarylandMassachusettsMichiganMinnesotaMississippiMissouriMontanaNebraskaNevadaNew HampshireNew JerseyNew MexicoNew YorkNorth CarolinaNorth DakotaOhioOklahomaOregonPennsylvaniaRhode IslandSouth CarolinaSouth DakotaTennesseeTexasUtahVermontVirginiaWashingtonWest VirginiaWisconsinWyoming
12 Angel Pharmacy Store [email protected] (908) 866-4260 12angel.store NO 1050 Bharat Diamond BourseBandra Kurla ComplexMumbai, Maharashtra 400051India NO YES YES AlabamaAlaskaArizonaArkansasCaliforniaColoradoConnecticutDelawareDistrict of ColumbiaFloridaGeorgiaHawaiiIdahoIllinoisIndianaIowaKansasKentuckyLouisianaMaineMarylandMassachusettsMichiganMinnesotaMississippiMissouriMontanaNebraskaNevadaNew HampshireNew JerseyNew MexicoNew YorkNorth CarolinaNorth DakotaOhioOklahomaOregonPennsylvaniaRhode IslandSouth CarolinaSouth DakotaTennesseeTexasUtahVermontVirginiaWashingtonWest VirginiaWisconsinWyomingGuamPuerto RicoVirgin IslandsArmed Forces AmericasArmed Forces EuropeArmed Forces Pacific
24 x 7 Pharma [email protected] (851) 127-5721 24x7pharma.com NO Mahek IconSumul Diary Road, KatargamSurat, Gujarat 395003India NO YES YES AlabamaAlaskaArizonaArkansasCaliforniaColoradoConnecticutDelawareDistrict of ColumbiaFloridaGeorgiaHawaiiIdahoIllinoisIndianaIowaKansasKentuckyLouisianaMaineMarylandMassachusettsMichiganMinnesotaMississippiMissouriMontanaNebraskaNevadaNew HampshireNew JerseyNew MexicoNew YorkNorth CarolinaNorth DakotaOhioOklahomaOregonPennsylvaniaRhode IslandSouth CarolinaSouth DakotaTennesseeTexasUtahVermontVirginiaWashingtonWest VirginiaWisconsinWyomingGuamPuerto RicoVirgin IslandsArmed Forces AmericasArmed Forces EuropeArmed Forces Pacific

...

  • Related