I am trying to scrape the table data from this table URL: https://covid19criticalcare.com/pharmacies/
On my previous scrape I used the following Python packages: from bs4 import BeautifulSoup import requests import mysql.connector import pandas as pd from sqlalchemy import create_engine
But this url's HTML doesn't contain the table data on it, instead it seems to be drawing the data from an external database.
Could someone please point me in the right direction for scraping a table data with this sort of HTML setup using a python script?
I tried doing a blind scrape, by using the method I used on my previous scrape.
from bs4 import BeautifulSoup
import requests
import mysql.connector
import pandas as pd
from sqlalchemy import create_engine
url = "https://covid19criticalcare.com/pharmacies/"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
result = requests.get(url, headers = headers)
doc = BeautifulSoup(result.text, "html.parser")
name = doc.find_all("td", class\_="column-1")
td_pharmacy_name = \[\]
for td in name:
names = td.text
td_names.append(names)
print(td_names)
CodePudding user response:
The content of what you are trying to scrape is available when the Javascript on the website gets rendered. The simplest way for this is to either mock the request using the same Rest API method or use a library that helps rendered the content; for instance, Selenium, Scrapy, etc.
For more details on how to scrape JS-rendered content, you can check out this thread Web-scraping JavaScript page with Python
For basic troubleshooting on how you can view the request and response, you can open up the Chrome Developer Tool by right click on the HTML page > click on "inspect" > click on "Network" tab > click on "Fetch/XHR" > Press "command Shift R" to reload your page once
.
If you are unsure which request contains the data you are looking for, you can use command F
to search and type in the keyword, and Chrome will list out the requests that match your searches
This image shows that the data is sent using AJAX and it also depicts the result of the steps above
CodePudding user response:
Just as alternative to @Naphat Theerawats answer and while I noticed that you started with a
seleniumbased solution you could get your goal with that much easier in combination with
pandas`.
Load the website and extract table from driver.page_source
with pd.read_html()
- To avoid iterating each page just select Show All entries
Example
from selenium import webdriver
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import Select
import pandas as pd
url = 'https://covid19criticalcare.com/pharmacies/'
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
driver.get(url)
wait = WebDriverWait(driver, 5)
select = Select(wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '[name = "DataTables_Table_0_length"'))))
select.select_by_value('-1')
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, 'a.paginate_button.next.disabled')))
df = pd.read_html(driver.page_source, displayed_only=False)[1]
driver.close()
df
Output
Pharmacy Name | Phone | Website | Requires prescription? | Pharmacy Address | Based in the United States? | Overnight shipping to the United States? | Overnight International shipping? | Ships to the following States/Provinces | |
---|---|---|---|---|---|---|---|---|---|
0 Covid Pharmacy | [email protected] | (785) 672 9222 | 0covidpharmacy.com | NO | 245 Krishna Market Channi RoadNagpur, Maharashtra 440001India | NO | YES | YES | AlabamaAlaskaArizonaArkansasCaliforniaColoradoConnecticutDelawareDistrict of ColumbiaFloridaGeorgiaHawaiiIdahoIllinoisIndianaIowaKansasKentuckyLouisianaMaineMarylandMassachusettsMichiganMinnesotaMississippiMissouriMontanaNebraskaNevadaNew HampshireNew JerseyNew MexicoNew YorkNorth CarolinaNorth DakotaOhioOklahomaOregonPennsylvaniaRhode IslandSouth CarolinaSouth DakotaTennesseeTexasUtahVermontVirginiaWashingtonWest VirginiaWisconsinWyomingGuamPuerto RicoVirgin IslandsArmed Forces AmericasArmed Forces EuropeArmed Forces Pacific |
1 Ivermectin Service | [email protected] | (888) 290 0964 (US), 91 22509 72606 (IN) | 1ivermectin.com | NO | 1/16, First Floor, Tardeo Air Conditioned Market Building, TardeoMumbai, Tardeo 400034India | NO | YES | YES | AlabamaAlaskaArizonaArkansasCaliforniaColoradoConnecticutDelawareDistrict of ColumbiaFloridaGeorgiaIdahoIllinoisIndianaIowaKansasKentuckyLouisianaMaineMarylandMassachusettsMichiganMinnesotaMississippiMissouriMontanaNebraskaNevadaNew HampshireNew JerseyNew MexicoNew YorkNorth CarolinaNorth DakotaOhioOklahomaOregonPennsylvaniaRhode IslandSouth CarolinaSouth DakotaTennesseeTexasUtahVermontVirginiaWashingtonWest VirginiaWisconsinWyomingPuerto RicoVirgin Islands |
1 Life Pharmacy | [email protected] | (888) 560-0430 (US); 91 (807 ) 127-9990 (India) | 1lifepharmacy.net | NO | 302, Pride Plaza, Rajkot, 360002Rajkot, Gujarat 360002; 84118India | NO | YES | YES | AlabamaAlaskaArizonaArkansasCaliforniaColoradoConnecticutDelawareDistrict of ColumbiaFloridaGeorgiaHawaiiIdahoIllinoisIndianaIowaKansasKentuckyLouisianaMaineMarylandMassachusettsMichiganMinnesotaMississippiMissouriMontanaNebraskaNevadaNew HampshireNew JerseyNew MexicoNew YorkNorth CarolinaNorth DakotaOhioOklahomaOregonPennsylvaniaRhode IslandSouth CarolinaSouth DakotaTennesseeTexasUtahVermontVirginiaWashingtonWest VirginiaWisconsinWyoming |
1-2-3 RX Global Pharmacy | [email protected] | (516) 758-2630 | 123rx.net | NO | 2967 Dundas St. W.Toronto, Ontario M6P 1Z2Canada | NO | YES | YES | AlabamaAlaskaArizonaArkansasCaliforniaColoradoConnecticutDelawareDistrict of ColumbiaFloridaGeorgiaHawaiiIdahoIllinoisIndianaIowaKansasKentuckyLouisianaMaineMarylandMassachusettsMichiganMinnesotaMississippiMissouriMontanaNebraskaNevadaNew HampshireNew JerseyNew MexicoNew YorkNorth CarolinaNorth DakotaOhioOklahomaOregonPennsylvaniaRhode IslandSouth CarolinaSouth DakotaTennesseeTexasUtahVermontVirginiaWashingtonWest VirginiaWisconsinWyoming |
12 Angel Pharmacy Store | [email protected] | (908) 866-4260 | 12angel.store | NO | 1050 Bharat Diamond BourseBandra Kurla ComplexMumbai, Maharashtra 400051India | NO | YES | YES | AlabamaAlaskaArizonaArkansasCaliforniaColoradoConnecticutDelawareDistrict of ColumbiaFloridaGeorgiaHawaiiIdahoIllinoisIndianaIowaKansasKentuckyLouisianaMaineMarylandMassachusettsMichiganMinnesotaMississippiMissouriMontanaNebraskaNevadaNew HampshireNew JerseyNew MexicoNew YorkNorth CarolinaNorth DakotaOhioOklahomaOregonPennsylvaniaRhode IslandSouth CarolinaSouth DakotaTennesseeTexasUtahVermontVirginiaWashingtonWest VirginiaWisconsinWyomingGuamPuerto RicoVirgin IslandsArmed Forces AmericasArmed Forces EuropeArmed Forces Pacific |
24 x 7 Pharma | [email protected] | (851) 127-5721 | 24x7pharma.com | NO | Mahek IconSumul Diary Road, KatargamSurat, Gujarat 395003India | NO | YES | YES | AlabamaAlaskaArizonaArkansasCaliforniaColoradoConnecticutDelawareDistrict of ColumbiaFloridaGeorgiaHawaiiIdahoIllinoisIndianaIowaKansasKentuckyLouisianaMaineMarylandMassachusettsMichiganMinnesotaMississippiMissouriMontanaNebraskaNevadaNew HampshireNew JerseyNew MexicoNew YorkNorth CarolinaNorth DakotaOhioOklahomaOregonPennsylvaniaRhode IslandSouth CarolinaSouth DakotaTennesseeTexasUtahVermontVirginiaWashingtonWest VirginiaWisconsinWyomingGuamPuerto RicoVirgin IslandsArmed Forces AmericasArmed Forces EuropeArmed Forces Pacific |
...