Home > Net >  Scrape WebPage using Selenium
Scrape WebPage using Selenium

Time:07-25

I want to extract data from a enter image description here

CodePudding user response:

Website is under Cloudflare protection, so normal chromedriver(nor geckodriver/Firefox) will not work here. You need to use something like undetected-chromedriver (install with pip install undetected-chromedriver). The following code will give you the results you're looking for:

import undetected_chromedriver as uc
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support.ui import Select
from selenium.webdriver.support import expected_conditions as EC
import time as t
import pandas as pd

browser = uc.Chrome()

url = 'https://www.nccpl.com.pk/en/portfolio-investments/lipi-sector-wise-daily'

start_date = '4/02/2021'
end_date = '5/08/2021'

browser.get(url)
picker = wait(browser, 10).until(EC.presence_of_element_located((By.ID, 'popupDatepicker')))
t.sleep(2)
browser.execute_script('arguments[0].scrollIntoView();', picker)
picker.click()
t.sleep(3)
browser.execute_script(f'arguments[0].value = "{start_date}";', picker)
print('clicked the day!')
t.sleep(5)

picker = wait(browser, 10).until(EC.presence_of_element_located((By.ID, 'popupDatepicker1')))
t.sleep(2)
browser.execute_script('arguments[0].scrollIntoView();', picker)
picker.click()
t.sleep(4)
browser.execute_script(f'arguments[0].value = "{end_date}";', picker)
print('clicked the second day!')
t.sleep(1)
search_button = browser.find_element(By.XPATH, '//button[@]/parent::div')
t.sleep(2)
search_button.click()
print('clicked search!')
t.sleep(10)
dfs = pd.read_html(browser.page_source)
dfs[0]

This returns a dataframe with 310 rows × 11 columns:

CLIENT TYPE SEC CODE    SECTOR NAME MARKET TYPE BUY VOLUME  BUY VALUE   SELL VOLUME SELL VALUE  NET VOLUME  NET VALUE   USD
0   BANKS / DFI S0036   Debt Market BNB 704000  3642335564  (808,511)   (4,399,617,995) (104,511)   (757,282,431)   (4,845,355)
1   BANKS / DFI S0005   Cement  FUT 3683000 323386465   (5,663,500) (573,804,520)   (1,980,500) (250,418,055)   (1,591,134)
2   BANKS / DFI S0007   Fertilizer  FUT 146000  43869760    (609,000)   (57,979,250)    (463,000)   (14,109,490)    (89,603)
3   BANKS / DFI S0008   Food and Personal Care Products FUT 22831500    890104750   (31,125,000)    (1,246,403,820) (8,293,500) (356,299,070)   (2,281,920)
4   BANKS / DFI S0019   Oil and Gas Exploration Companies   FUT 490000  51735820    (1,036,500) (102,697,770)   (546,500)   (50,961,950)    (323,286)
... ... ... ... ... ... ... ... ... ... ... 
  • Related