Code trials:
from gettext import find
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
path="C:\Program Files (x86)\chromedriver.exe"
driver=webdriver.Chrome(path)
driver.get("https://targetstudy.com/school/state-board-schools-in-himachal-pradesh.html")
section=driver.find_element_by_class_name("section")
print(section.text)
driver.quit()
I was able to write this but after that I wasn't able to extract the desired texts. See pic I want the highlighted texts:
these highlighted texts i want to extract
In this excel format
There are 25 entries in 1 page so I also have to click on the next button with the help of Selenium.
CodePudding user response:
The url isn't dynamic. So no need to use selenium. You can pull all data using BeautifulSoup. Below is given an example and rest of the your task.
import requests
from bs4 import BeautifulSoup
headers={'User-Agent':'mozilla/5.0'}
url='https://targetstudy.com/school/state-board-schools-in-himachal-pradesh.html'
req =requests.get(url,headers=headers)
print(req)
soup = BeautifulSoup(req.content, 'lxml')
cards = soup.select('div.card-body')
for card in cards:
try:
school_name = card.select_one('.media-body a h4').text
except:
pass
try:
address = card.select_one('p.card-subtitle.mt-0').get_text().split('Indiaphone')[0].replace('\n','').replace('\xa0','')
except:
pass
try:
phone=card.select_one('p.card-subtitle.mt-0').get_text().split('Indiaphone')[1].replace('_iphone','').strip()
except:
pass
print([school_name,address,phone])
Output:
['Aadhar Public School', 'location_on Bir (Bagera) Hamirpur - 176110, Himachal Pradesh, ', '9418125341']
['Aakash Model School', 'location_on Lahra Galore Hamirpur - 177026, Himachal Pradesh, ', '(01972)-243201phone 9418118090']
['Aastha Public School - Banuti', 'location_on Banuti Shimla - 171011, Himachal Pradesh, ', '(0177)-2802404phone 9418022024']
['Aastha Public School - Hatpang', 'location_on Hatpang Kangra - 176022, Himachal Pradesh, ', '9459068853']
['Abhi Public School', 'location_on Goli (Dalhousie) Chamba - 176305, Himachal Pradesh, ', '9418410844, 9418093564']
['Abhinav Vidya Mandir High School', 'location_on Haroli Una - 177220, Himachal Pradesh, ', '(01975)-284209phone 9418537523']
['Abhishek Public High School', 'location_on Rait Kangra - 176208, Himachal Pradesh, ', '(01892)-238524phone 9816297368']
['ACE Public School', 'location_on Sarah Kangra - 176215, Himachal Pradesh, ', '9418427824']
['Adarsh Bal Jyoti Public School', 'location_on Beru Thona Mandi - 175049, Himachal Pradesh,
', '9857560105, 9817163967']
['Adarsh Bal Mandir High School', 'location_on Bhawarna Kangra - 176083, Himachal Pradesh, ', '(01894)-247115']
['Adarsh Bal Vidya Mandir School', 'location_on Rajpur Sirmaur - 173025, Himachal Pradesh, ', '9816235897, 9816208863']
['Adarsh Bharti Public School', 'location_on NH-154, Jassur Kangra - 177201, Himachal Pradesh, ', '(01893)-226945phone 9418476945']
['Adarsh Bharti Public School', 'location_on Nagrota Surian Kangra - 176027, Himachal Pradesh, ', '9816461493']
['Adarsh Bharti Public School', 'location_on Samkehar Kangra - 176023, Himachal Pradesh, ', '9418013823']
['Adarsh Bharti Public School', 'location_on Village Dak & PO Dahab, Tehsil Nurpur Kangra - 176051, Himachal Pradesh, ', '9418356123, 9816923352']
['Adarsh Bhartiya Model School', 'location_on Gharan (Bhojpur) Kangra - 176402, Himachal Pradesh, ', '(01893)-245117phone 9816636653']
['Adarsh Gyan Public School', 'location_on Tatwali Kangra - 176058, Himachal Pradesh, ', '9816789311, 9872242147']
['Adarsh High School', 'location_on Ghumarwin Bilaspur - 174021, Himachal Pradesh, ', '(01978)-255605phone 9418042523']
['Adarsh High School', 'location_on Kuthera Bilaspur - 174026, Himachal Pradesh, ', '(01978)-275690phone 9817050966']
['Adarsh Jyoti Primary School', ' Mandi - 175001, Himachal Pradesh, ', '9817567850']
['Adarsh Jyoti Public School', 'location_on Beru Thona Mandi - 175049, Himachal Pradesh, ', '9857560105, 9817163967']
['Adarsh Model High School', 'location_on Chatta Khad Kangra - 176025, Himachal Pradesh, ', '9816123223']
['Adarsh Model School', 'location_on Kariyara Kangra - 177017, Himachal Pradesh, ', '9218949303, 9736590988']
['Adarsh Primary School', 'location_on Umari-Gander Kangra - 176097, Himachal Pradesh, ', '9816123581']
['Adarsh Public School', 'location_on Mandrighat Bilaspur - 174013, Himachal Pradesh, ', '(01907)-283019phone 9817304028, 9418059027']
['Adarsh Public School', 'location_on Mandrighat Bilaspur - 174013, Himachal Pradesh, ', '(01907)-283019phone 9817304028, 9418059027']
['Adarsh Public School', 'location_on Mandrighat Bilaspur - 174013, Himachal Pradesh, ', '(01907)-283019phone 9817304028, 9418059027']
['Adarsh Public School', 'location_on Mandrighat Bilaspur - 174013, Himachal Pradesh, ', '(01907)-283019phone 9817304028, 9418059027']
CodePudding user response:
The desired texts of each item is within the following element:
<div >
Solution
To extract the desired texts you have to induce WebDriverWait for visibility_of_all_elements_located() and you can use either of the following locator strategies:
Using XPATH:
driver.get("https://targetstudy.com/school/state-board-schools-in-himachal-pradesh.html") print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='media-body'][.//a/h4]")))])
Console Output:
['Aadhar Public School\nlocation_on Bir (Bagera)\n Hamirpur - 176110, Himachal Pradesh, India\nphone_iphone 9418125341', 'Aakash Model School\nlocation_on Lahra Galore\n Hamirpur - 177026, Himachal Pradesh, India\nphone (01972)-243201\nphone_iphone 9418118090', 'Aastha Public School - Banuti\nlocation_on Banuti\n Shimla - 171011, Himachal Pradesh, India\nphone (0177)-2802404\nphone_iphone 9418022024', 'Aastha Public School - Hatpang\nlocation_on Hatpang\n Kangra - 176022, Himachal Pradesh, India\nphone_iphone 9459068853', 'Abhi Public School\nlocation_on Goli (Dalhousie)\n Chamba - 176305, Himachal Pradesh, India\nphone_iphone 9418410844, 9418093564', 'Abhinav Vidya Mandir High School\nlocation_on Haroli\n Una - 177220, Himachal Pradesh, India\nphone (01975)-284209\nphone_iphone 9418537523', 'Abhishek Public High School\nlocation_on Rait\n Kangra - 176208, Himachal Pradesh, India\nphone (01892)-238524\nphone_iphone 9816297368', 'ACE Public School\nlocation_on Sarah\n Kangra - 176215, Himachal Pradesh, India\nphone_iphone 9418427824', 'Adarsh Bal Jyoti Public School\nlocation_on Beru Thona\n Mandi - 175049, Himachal Pradesh, India\nphone_iphone 9857560105, 9817163967', 'Adarsh Bal Mandir High School\nlocation_on Bhawarna\n Kangra - 176083, Himachal Pradesh, India\nphone (01894)-247115', 'Adarsh Bal Vidya Mandir School\nlocation_on Rajpur\n Sirmaur - 173025, Himachal Pradesh, India\nphone_iphone 9816235897, 9816208863', 'Adarsh Bharti Public School\nlocation_on NH-154, Jassur\n Kangra - 177201, Himachal Pradesh, India\nphone (01893)-226945\nphone_iphone 9418476945', 'Adarsh Bharti Public School\nlocation_on Nagrota Surian\n Kangra - 176027, Himachal Pradesh, India\nphone_iphone 9816461493', 'Adarsh Bharti Public School\nlocation_on Samkehar\n Kangra - 176023, Himachal Pradesh, India\nphone_iphone 9418013823', 'Adarsh Bharti Public School\nlocation_on Village Dak & PO Dahab, Tehsil Nurpur\n Kangra - 176051, Himachal Pradesh, India\nphone_iphone 9418356123, 9816923352', 'Adarsh Bhartiya Model School\nlocation_on Gharan (Bhojpur)\n Kangra - 176402, Himachal Pradesh, India\nphone (01893)-245117\nphone_iphone 9816636653', 'Adarsh Gyan Public School\nlocation_on Tatwali\n Kangra - 176058, Himachal Pradesh, India\nphone_iphone 9816789311, 9872242147', 'Adarsh High School\nlocation_on Ghumarwin\n Bilaspur - 174021, Himachal Pradesh, India\nphone (01978)-255605\nphone_iphone 9418042523', 'Adarsh High School\nlocation_on Kuthera\n Bilaspur - 174026, Himachal Pradesh, India\nphone (01978)-275690\nphone_iphone 9817050966', 'Adarsh Jyoti Primary School\n\n Mandi - 175001, Himachal Pradesh, India\nphone_iphone 9817567850', 'Adarsh Jyoti Public School\nlocation_on Beru Thona\n Mandi - 175049, Himachal Pradesh, India\nphone_iphone 9857560105, 9817163967', 'Adarsh Model High School\nlocation_on Chatta Khad\n Kangra - 176025, Himachal Pradesh, India\nphone_iphone 9816123223', 'Adarsh Model School\nlocation_on Kariyara\n Kangra - 177017, Himachal Pradesh, India\nphone_iphone 9218949303, 9736590988', 'Adarsh Primary School\nlocation_on Umari-Gander\n Kangra - 176097, Himachal Pradesh, India\nphone_iphone 9816123581', 'Adarsh Public School\nlocation_on Mandrighat\n Bilaspur - 174013, Himachal Pradesh, India\nphone (01907)-283019\nphone_iphone 9817304028, 9418059027']
Note : You have to add the following imports :
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
CodePudding user response:
Find where you can access the next webpage from the current web page and use
.click
on the element