Home > other >  Need help in web scraping using Selenium
Need help in web scraping using Selenium

Time:04-28

Code trials:

from gettext import find
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
path="C:\Program Files (x86)\chromedriver.exe"
driver=webdriver.Chrome(path)
driver.get("https://targetstudy.com/school/state-board-schools-in-himachal-pradesh.html")
section=driver.find_element_by_class_name("section")
print(section.text)
driver.quit()

I was able to write this but after that I wasn't able to extract the desired texts. See pic I want the highlighted texts:

these highlighted texts i want to extract

In this excel format

my desired excel format

There are 25 entries in 1 page so I also have to click on the next button with the help of Selenium.

CodePudding user response:

The url isn't dynamic. So no need to use selenium. You can pull all data using BeautifulSoup. Below is given an example and rest of the your task.

import requests
from bs4 import BeautifulSoup

headers={'User-Agent':'mozilla/5.0'}
url='https://targetstudy.com/school/state-board-schools-in-himachal-pradesh.html'
req =requests.get(url,headers=headers)
print(req)  
soup = BeautifulSoup(req.content, 'lxml')
cards = soup.select('div.card-body')

for card in cards:
    try:
        school_name = card.select_one('.media-body a h4').text   
    except:
        pass
    try:
        address = card.select_one('p.card-subtitle.mt-0').get_text().split('Indiaphone')[0].replace('\n','').replace('\xa0','')    
    except:
        pass
    try:
        phone=card.select_one('p.card-subtitle.mt-0').get_text().split('Indiaphone')[1].replace('_iphone','').strip()
        
    except:
        pass

    print([school_name,address,phone])

Output:

['Aadhar Public School', 'location_on Bir (Bagera)  Hamirpur - 176110, Himachal Pradesh, ', '9418125341']
['Aakash Model School', 'location_on Lahra Galore  Hamirpur - 177026, Himachal Pradesh, ', '(01972)-243201phone  9418118090']
['Aastha Public School - Banuti', 'location_on Banuti  Shimla - 171011, Himachal Pradesh, ', '(0177)-2802404phone  9418022024']
['Aastha Public School - Hatpang', 'location_on Hatpang  Kangra - 176022, Himachal Pradesh, ', '9459068853']
['Abhi Public School', 'location_on Goli (Dalhousie)  Chamba - 176305, Himachal Pradesh, ', '9418410844, 9418093564']
['Abhinav Vidya Mandir High School', 'location_on Haroli  Una - 177220, Himachal Pradesh, ', '(01975)-284209phone  9418537523']
['Abhishek Public High School', 'location_on Rait  Kangra - 176208, Himachal Pradesh, ', '(01892)-238524phone  9816297368']
['ACE Public School', 'location_on Sarah  Kangra - 176215, Himachal Pradesh, ', '9418427824'] 
['Adarsh Bal Jyoti Public School', 'location_on Beru Thona  Mandi - 175049, Himachal Pradesh, 
', '9857560105, 9817163967']
['Adarsh Bal Mandir High School', 'location_on Bhawarna  Kangra - 176083, Himachal Pradesh, ', '(01894)-247115']
['Adarsh Bal Vidya Mandir School', 'location_on Rajpur  Sirmaur - 173025, Himachal Pradesh, ', '9816235897, 9816208863']
['Adarsh Bharti Public School', 'location_on NH-154, Jassur  Kangra - 177201, Himachal Pradesh, ', '(01893)-226945phone  9418476945']
['Adarsh Bharti Public School', 'location_on Nagrota Surian  Kangra - 176027, Himachal Pradesh, ', '9816461493']
['Adarsh Bharti Public School', 'location_on Samkehar  Kangra - 176023, Himachal Pradesh, ', '9418013823']
['Adarsh Bharti Public School', 'location_on Village Dak & PO Dahab, Tehsil Nurpur  Kangra - 176051, Himachal Pradesh, ', '9418356123, 9816923352']
['Adarsh Bhartiya Model School', 'location_on Gharan (Bhojpur)  Kangra - 176402, Himachal Pradesh, ', '(01893)-245117phone  9816636653']
['Adarsh Gyan Public School', 'location_on Tatwali  Kangra - 176058, Himachal Pradesh, ', '9816789311, 9872242147']
['Adarsh High School', 'location_on Ghumarwin  Bilaspur - 174021, Himachal Pradesh, ', '(01978)-255605phone  9418042523']
['Adarsh High School', 'location_on Kuthera  Bilaspur - 174026, Himachal Pradesh, ', '(01978)-275690phone  9817050966']
['Adarsh Jyoti Primary School', ' Mandi - 175001, Himachal Pradesh, ', '9817567850']
['Adarsh Jyoti Public School', 'location_on Beru Thona  Mandi - 175049, Himachal Pradesh, ', '9857560105, 9817163967']
['Adarsh Model High School', 'location_on Chatta Khad  Kangra - 176025, Himachal Pradesh, ', '9816123223']
['Adarsh Model School', 'location_on Kariyara  Kangra - 177017, Himachal Pradesh, ', '9218949303, 9736590988']
['Adarsh Primary School', 'location_on Umari-Gander  Kangra - 176097, Himachal Pradesh, ', '9816123581']
['Adarsh Public School', 'location_on Mandrighat  Bilaspur - 174013, Himachal Pradesh, ', '(01907)-283019phone  9817304028, 9418059027']
['Adarsh Public School', 'location_on Mandrighat  Bilaspur - 174013, Himachal Pradesh, ', '(01907)-283019phone  9817304028, 9418059027']
['Adarsh Public School', 'location_on Mandrighat  Bilaspur - 174013, Himachal Pradesh, ', '(01907)-283019phone  9817304028, 9418059027']
['Adarsh Public School', 'location_on Mandrighat  Bilaspur - 174013, Himachal Pradesh, ', '(01907)-283019phone  9817304028, 9418059027']
     

CodePudding user response:

The desired texts of each item is within the following element:

<div >

Solution

To extract the desired texts you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:

  • Using XPATH:

    driver.get("https://targetstudy.com/school/state-board-schools-in-himachal-pradesh.html")
    print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='media-body'][.//a/h4]")))])
    
  • Console Output:

    ['Aadhar Public School\nlocation_on Bir (Bagera)\n  Hamirpur - 176110, Himachal Pradesh, India\nphone_iphone 9418125341', 'Aakash Model School\nlocation_on Lahra Galore\n  Hamirpur - 177026, Himachal Pradesh, India\nphone (01972)-243201\nphone_iphone 9418118090', 'Aastha Public School - Banuti\nlocation_on Banuti\n  Shimla - 171011, Himachal Pradesh, India\nphone (0177)-2802404\nphone_iphone 9418022024', 'Aastha Public School - Hatpang\nlocation_on Hatpang\n  Kangra - 176022, Himachal Pradesh, India\nphone_iphone 9459068853', 'Abhi Public School\nlocation_on Goli (Dalhousie)\n  Chamba - 176305, Himachal Pradesh, India\nphone_iphone 9418410844, 9418093564', 'Abhinav Vidya Mandir High School\nlocation_on Haroli\n  Una - 177220, Himachal Pradesh, India\nphone (01975)-284209\nphone_iphone 9418537523', 'Abhishek Public High School\nlocation_on Rait\n  Kangra - 176208, Himachal Pradesh, India\nphone (01892)-238524\nphone_iphone 9816297368', 'ACE Public School\nlocation_on Sarah\n  Kangra - 176215, Himachal Pradesh, India\nphone_iphone 9418427824', 'Adarsh Bal Jyoti Public School\nlocation_on Beru Thona\n  Mandi - 175049, Himachal Pradesh, India\nphone_iphone 9857560105, 9817163967', 'Adarsh Bal Mandir High School\nlocation_on Bhawarna\n  Kangra - 176083, Himachal Pradesh, India\nphone (01894)-247115', 'Adarsh Bal Vidya Mandir School\nlocation_on Rajpur\n  Sirmaur - 173025, Himachal Pradesh, India\nphone_iphone 9816235897, 9816208863', 'Adarsh Bharti Public School\nlocation_on NH-154, Jassur\n  Kangra - 177201, Himachal Pradesh, India\nphone (01893)-226945\nphone_iphone 9418476945', 'Adarsh Bharti Public School\nlocation_on Nagrota Surian\n  Kangra - 176027, Himachal Pradesh, India\nphone_iphone 9816461493', 'Adarsh Bharti Public School\nlocation_on Samkehar\n  Kangra - 176023, Himachal Pradesh, India\nphone_iphone 9418013823', 'Adarsh Bharti Public School\nlocation_on Village Dak & PO Dahab, Tehsil Nurpur\n  Kangra - 176051, Himachal Pradesh, India\nphone_iphone 9418356123, 9816923352', 'Adarsh Bhartiya Model School\nlocation_on Gharan (Bhojpur)\n  Kangra - 176402, Himachal Pradesh, India\nphone (01893)-245117\nphone_iphone 9816636653', 'Adarsh Gyan Public School\nlocation_on Tatwali\n  Kangra - 176058, Himachal Pradesh, India\nphone_iphone 9816789311, 9872242147', 'Adarsh High School\nlocation_on Ghumarwin\n  Bilaspur - 174021, Himachal Pradesh, India\nphone (01978)-255605\nphone_iphone 9418042523', 'Adarsh High School\nlocation_on Kuthera\n  Bilaspur - 174026, Himachal Pradesh, India\nphone (01978)-275690\nphone_iphone 9817050966', 'Adarsh Jyoti Primary School\n\n  Mandi - 175001, Himachal Pradesh, India\nphone_iphone 9817567850', 'Adarsh Jyoti Public School\nlocation_on Beru Thona\n  Mandi - 175049, Himachal Pradesh, India\nphone_iphone 9857560105, 9817163967', 'Adarsh Model High School\nlocation_on Chatta Khad\n  Kangra - 176025, Himachal Pradesh, India\nphone_iphone 9816123223', 'Adarsh Model School\nlocation_on Kariyara\n  Kangra - 177017, Himachal Pradesh, India\nphone_iphone 9218949303, 9736590988', 'Adarsh Primary School\nlocation_on Umari-Gander\n  Kangra - 176097, Himachal Pradesh, India\nphone_iphone 9816123581', 'Adarsh Public School\nlocation_on Mandrighat\n  Bilaspur - 174013, Himachal Pradesh, India\nphone (01907)-283019\nphone_iphone 9817304028, 9418059027']
    
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

CodePudding user response:

Find where you can access the next webpage from the current web page and use

.click

on the element

  • Related