Home > Enterprise >  How to follow links if in HTML element in href attribute we have href='#' in scrapy?
How to follow links if in HTML element in href attribute we have href='#' in scrapy?

Time:01-28

I am trying to scrape Niche.com website to extract all schools and details of schools which are present in each school links but if we try to follow the school link in href attribute we have href = "#" so scrapy unable to get inside each school page and collect the data

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule


class NicheschoolsSpider(scrapy.Spider):
    name = 'nicheschools'
    allowed_domains = ['www.niche.com']
    start_urls = ['https://www.niche.com/k12/search/best-schools/s/wisconsin/']

def parse(self, response):
    schoollink = response.xpath("//div[@class='search-result__title-wrapper']/h2")
    for school in schoollink:
        name= school.xpath(".//text()").get()
        link = school.xpath(".//@href").get()
        yield {
            'name':name,
            'link':link
        }
        yield response.follow(url=link,callback =self.parse_schools)


def parse_schools(self,response):
    name = response.xpath("//h1[@class='postcard__title postcard__title--claimed']/text()").get()
    website = response.xpath("(//a[@class='profile__website__link']/@href)[1]").get()
    address = response.xpath("(//address[@class='profile__address--compact']/text())[1]").get()

    yield{
        'name':name,
        "website":website,
        'address':address
    }

OUTPUT FOR ONE ENTRY: 2023-01-25 16:33:10 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.niche.com/k12/search/best-schools/s/wisconsin/\\> {'name': 'Brookfield Central High School', 'link': '#'} when it try to get inside link shown below 2023-01-25 16:33:12 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.niche.com/k12/search/best-schools/s/wisconsin/\\> {'name': None, 'website': None, 'address': None}

Trying to get inside each school link and collect schoolname, address, telephone, tutuion fees, enrollment for particular link.

CodePudding user response:

Not really a job for Scrapy, although it can certainly be accomplished with Scrapy. Website is dynamic, pulling data from an API endpoint. I won't be setting up a Scrapy project just to answer your question, but I will demonstrate how you can get the data using Requests and pandas (code is ran in Jupyter notebook):

import requests
import pandas as pd
from tqdm.notebook import tqdm

pd.set_option('display.max_columns', None, 'display.max_colwidth', None)

headers = {
    'accept-language': 'en-US,en;q=0.9',
    'accept': 'application/json',
    'referer': 'https://www.niche.com/k12/search/best-schools/s/wisconsin/',
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'
}

big_df = pd.DataFrame()
s = requests.Session()
s.headers.update(headers)

for x in tqdm(range(1, 5)):
    r = s.get(f'https://www.niche.com/api/renaissance/results/?state=wisconsin&listURL=best-schools&page={x}&searchType=school')
    df = pd.json_normalize(r.json(), record_path=['entities'])
    big_df = pd.concat([big_df, df], axis=0, ignore_index=True)
display(big_df)

Result in terminal:

100%
4/4 [00:02<00:00, 2.12it/s]
guid    ctas    badge.display   badge.ordinal   badge.total badge.vanityURL badge.photoURLs.desktop badge.photoURLs.mobile  content.centroid.lat    content.centroid.lon    content.entity.abbreviation content.entity.alternates.nces  content.entity.character    content.entity.claimed  content.entity.displayable  content.entity.genus    content.entity.guid content.entity.isClaimed    content.entity.isPremium    content.entity.location content.entity.name content.entity.parentGUIDs.county   content.entity.parentGUIDs.metroArea    content.entity.parentGUIDs.state    content.entity.parentGUIDs.town content.entity.parentGUIDs.zipCode  content.entity.premium  content.entity.published    content.entity.shortName    content.entity.tagline  content.entity.type content.entity.url  content.entity.variation    content.facts   content.featuredReview.author   content.featuredReview.body content.featuredReview.categories   content.featuredReview.created  content.featuredReview.guid content.featuredReview.rating   content.grades  content.photos.default.crops.DesktopHeader  content.photos.default.crops.MobileHeader   content.photos.default.crops.Original   content.photos.default.guid content.photos.default.licenseName  content.photos.editorial.crops.Original content.photos.editorial.guid   content.photos.editorial.licenseName    content.photos.editorial.uploadTimestamp    content.photos.mapbox_header.author content.photos.mapbox_header.crops.DesktopHeader    content.photos.mapbox_header.crops.MobileHeader content.photos.mapbox_header.guid   content.photos.mapbox_header.licenseName    content.photos.mapbox_header.licenseUrl content.photos.mapbox_header.sourceUrl  content.photos.spotlight.crops.Original content.photos.spotlight.crops.Spotlight    content.photos.spotlight.guid   content.photos.spotlight.licenseName    content.photos.spotlight.uploadTimestamp    content.reviewAverage.average   content.reviewAverage.count content.virtualTour content.entity.alternates.ceeb  content.photos.default.crops.Thumbnail  content.photos.default.uploadTimestamp  content.entity.parentGUIDs.parent   content.entity.parentGUIDs.schoolDistrict   content.entity.parentGUIDs.schoolNetwork    content.entity.parentGUIDs.neighborhood content.photos.default.crops.Spotlight
0   d6574ad4-6add-45c3-a90a-9d24f58b040e    [{'label': 'View Nearby Homes', 'type': 'realE...   Best Private High Schools in Wisconsin  1   82  best-private-high-schools/s/wisconsin   https://d33a4decm84gsn.cloudfront.net/search/2...   https://d33a4decm84gsn.cloudfront.net/search/2...   43.081737   -88.145195  Brookfield Academy  0   Private True    True    Private School  d6574ad4-6add-45c3-a90a-9d24f58b040e    True    True    BROOKFIELD, WI  Brookfield Academy  ba8709ae-856d-4583-83b7-4484b51ed4c2    3940b781-a9f6-4333-b607-6a6367e6af44    963a1085-efe7-45f5-81ee-d2bbf82a907c    cc01665b-5240-4885-b13d-a4ae0dd271fc    b802227a-e061-45e5-9dfd-6c3ddaf8bebb    True    True    Brookfield Academy  [Private School, BROOKFIELD, WI, PK, K-12]  School  brookfield-academy-brookfield-wi    1041    [{'config': {'format': ['comma'], 'rounding': ...   Parent  When my kids started school something just did...   [Overall Experience]    2022-07-28T18:37:49.017538Z e3bfc3ad-86eb-4bba-8c66-5eb95e4111f7    5.0 [{'description': 'Based on quality of academic...   https://d13b2ieg84qqce.cloudfront.net/d1d42e87...   https://d13b2ieg84qqce.cloudfront.net/c046f1e3...   https://d13b2ieg84qqce.cloudfront.net/d1d42e87...   a4125add-a984-4609-a879-ce1afa699db8    UNLICENSED  https://d13b2ieg84qqce.cloudfront.net/352e79e1...   53acd2d2-1185-49bd-9928-6a1f1054fba0    UNLICENSED  2022-02-10T21:15:52.569792Z © Mapbox    https://api.mapbox.com/styles/v1/niche-admin/c...   https://api.mapbox.com/styles/v1/niche-admin/c...   f696705b-0766-48e5-97b5-72370788f0c6    © OpenStreetMap http://www.openstreetmap.org/copyright  https://www.mapbox.com/about/maps/  https://d13b2ieg84qqce.cloudfront.net/2273b7a3...   https://d13b2ieg84qqce.cloudfront.net/d512adbc...   2a0ddcf9-ae58-404f-9d91-13a5196c2217    UNLICENSED  2022-07-28T18:00:15.479792Z 4.333333    39  [{'label': 'Virtual Tour', 'value': 'https://w...   NaN NaN NaN NaN NaN NaN NaN NaN
1   c5ce3267-c2ed-4785-a5d8-66c61fcf6063    [{'label': 'View Nearby Homes', 'type': 'realE...   Best Private High Schools in Wisconsin  2   82  best-private-high-schools/s/wisconsin   https://d33a4decm84gsn.cloudfront.net/search/2...   https://d33a4decm84gsn.cloudfront.net/search/2...   43.186400   -87.935800  USM 01512787    Private True    True    Private School  c5ce3267-c2ed-4785-a5d8-66c61fcf6063    True    True    WI  University School of Milwaukee  8b295479-c31f-47a9-83b8-94b2100e2832    3940b781-a9f6-4333-b607-6a6367e6af44    963a1085-efe7-45f5-81ee-d2bbf82a907c    739d0594-0714-4d74-ad01-f07df19bc756    5d98fbca-9d9d-4219-8335-8dba54962ca7    True    True    University School   [Private School, WI, PK, K-12]  School  university-school-of-milwaukee-river-hills-wi   1041    [{'config': {'format': ['comma'], 'rounding': ...   Parent  It is clear to see, in the short time we’ve be...   [Overall Experience]    2022-10-28T07:41:07.70707Z  a7a94913-bb20-4def-9553-761720f5cac8    5.0 [{'description': 'Based on quality of academic...   https://d13b2ieg84qqce.cloudfront.net/184acaa6...   https://d13b2ieg84qqce.cloudfront.net/d98566c3...   https://d13b2ieg84qqce.cloudfront.net/c65ee0e3...   be9334fb-56d4-4c0c-a4b9-b2de53c46b09    UNLICENSED  https://d13b2ieg84qqce.cloudfront.net/887e0e98...   cfb21e87-82a7-4fa5-8af6-bf33d199039a    UNLICENSED  2022-02-10T21:12:07.916464Z © Mapbox    https://api.mapbox.com/styles/v1/niche-admin/c...   https://api.mapbox.com/styles/v1/niche-admin/c...   733bf01a-d21c-4374-bb52-42175a61a2c2    © OpenStreetMap http://www.openstreetmap.org/copyright  https://www.mapbox.com/about/maps/  https://d13b2ieg84qqce.cloudfront.net/e80f6114...   https://d13b2ieg84qqce.cloudfront.net/e80f6114...   c8cb35e1-b83c-47f1-a649-eb3766a53de7    UNLICENSED  NaN 4.209524    105 [{'label': 'Virtual Tour'}] 501390  https://d13b2ieg84qqce.cloudfront.net/97d061e2...   2022-07-11T13:31:31.710239Z NaN NaN NaN NaN NaN
2   84ab245d-ad99-43c9-93d8-9e474a109434    [{'label': 'View Nearby Homes', 'type': 'realE...   Best Private High Schools in Wisconsin  3   82  best-private-high-schools/s/wisconsin   https://d33a4decm84gsn.cloudfront.net/search/2...   https://d33a4decm84gsn.cloudfront.net/search/2...   43.163916   -89.385004  MCDS    A9904507    Private True    True    Private School  84ab245d-ad99-43c9-93d8-9e474a109434    True    True    WAUNAKEE, WI    Madison Country Day School  4135e47a-62f6-4777-b514-d2e51894603f    1a1aaa73-65d0-490d-b3d3-d828716c5f6b    963a1085-efe7-45f5-81ee-d2bbf82a907c    NaN 3bca1e55-0153-485a-a337-03448396568b    True    True    MCDS    [Private School, WAUNAKEE, WI, PK, K-12]    School  madison-country-day-school-waunakee-wi  1041    [{'config': {'format': ['comma'], 'rounding': ...   Parent  The MCDS faculty is truly exceptional -- they ...   [Overall Experience]    2022-07-22T13:59:50.567397Z 6c714271-25ac-4206-9ef8-38d3ef1f92d6    5.0 [{'description': 'Based on quality of academic...   https://d13b2ieg84qqce.cloudfront.net/68e0beb3...   https://d13b2ieg84qqce.cloudfront.net/3a1cfdcf...   https://d13b2ieg84qqce.cloudfront.net/b2d1416c...   86a7a6ce-2538-4bf1-8703-6b3b44fda5a4    UNLICENSED  https://d13b2ieg84qqce.cloudfront.net/6cb7bdfd...   809aece8-55ce-4632-a3cf-d0a14417ffdc    UNLICENSED  2022-02-09T21:25:15.513499Z © Mapbox    https://api.mapbox.com/styles/v1/niche-admin/c...   https://api.mapbox.com/styles/v1/niche-admin/c...   dc5b6bd7-5a5c-48ee-bdfd-5780de198bc9    © OpenStreetMap http://www.openstreetmap.org/copyright  https://www.mapbox.com/about/maps/  https://d13b2ieg84qqce.cloudfront.net/66e8fd60...   https://d13b2ieg84qqce.cloudfront.net/fb59b45f...   2e6d1bd0-7760-46ae-8ce2-02306508b864    UNLICENSED  2022-04-18T18:58:56.007652Z 3.882353    34  [{'label': 'Virtual Tour', 'value': 'https://w...   502396  https://d13b2ieg84qqce.cloudfront.net/6ea5d8cb...   2022-06-08T22:11:36.605259Z NaN NaN NaN NaN NaN
3   35ca6237-c994-4fe6-b5f9-f09142680d7b    [{'label': 'View Nearby Homes', 'type': 'realE...   Best Private High Schools in Wisconsin  4   82  best-private-high-schools/s/wisconsin   https://d33a4decm84gsn.cloudfront.net/search/2...   https://d33a4decm84gsn.cloudfront.net/search/2...   43.457700   -88.827400  Wayland Academy 01514944    Private, Boarding   True    True    Private School  35ca6237-c994-4fe6-b5f9-f09142680d7b    True    True    BEAVER DAM, WI  Wayland Academy 3c05ff22-e610-450d-8684-1b9f99edcd1f    NaN 963a1085-efe7-45f5-81ee-d2bbf82a907c    1d49bb1b-d2a1-45e2-ac8e-c8d16ab29f3e    f132a02a-1ead-4325-bf32-9079b435d74c    True    True    Wayland [Private School, BEAVER DAM, WI, 9-12]  School  wayland-academy-beaver-dam-wi   1040    [{'config': {'format': ['comma'], 'rounding': ...   Alum    Though I only attended Wayland for two years (...   [Overall Experience]    2022-08-14T20:05:05.231126Z a0bf7334-047c-4ee8-ab95-59c46dff42b3    5.0 [{'description': 'Based on quality of academic...   https://d13b2ieg84qqce.cloudfront.net/7cc728a3...   https://d13b2ieg84qqce.cloudfront.net/5e24f8a2...   https://d13b2ieg84qqce.cloudfront.net/d7835cfd...   99230263-8332-4b03-b475-b948546402b7    UNLICENSED  https://d13b2ieg84qqce.cloudfront.net/42561f2c...   697e0f82-7ccb-4651-877e-ffe881e188c5    UNLICENSED  NaN © Mapbox    https://api.mapbox.com/styles/v1/niche-admin/c...   https://api.mapbox.com/styles/v1/niche-admin/c...   c641160c-30c7-4b52-b336-e844ac8a059a    © OpenStreetMap http://www.openstreetmap.org/copyright  https://www.mapbox.com/about/maps/  https://d13b2ieg84qqce.cloudfront.net/3aaf34d3...   https://d13b2ieg84qqce.cloudfront.net/f197c0a9...   124661d4-71cd-4d13-bfcc-926f3e074ade    UNLICENSED  2022-09-28T16:06:46.315837Z 3.833333    66  [{'label': 'Virtual Tour', 'value': 'https://y...   500170  https://d13b2ieg84qqce.cloudfront.net/9b54f4ea...   2022-07-26T17:26:09.050891Z NaN NaN NaN NaN NaN
4   9b394d9c-46a0-431d-8ae4-62b6142cd46b    [{'label': 'View Nearby Homes', 'type': 'realE...   Best Private High Schools in Wisconsin  5   82  best-private-high-schools/s/wisconsin   https://d33a4decm84gsn.cloudfront.net/search/2...   https://d33a4decm84gsn.cloudfront.net/search/2...   42.773585   -87.774410  TPS 01513124    Private True    True    Private School  9b394d9c-46a0-431d-8ae4-62b6142cd46b    True    False   WIND POINT, WI  The Prairie School  5455e716-0063-4d63-a0e2-a07d199cdee1    3940b781-a9f6-4333-b607-6a6367e6af44    963a1085-efe7-45f5-81ee-d2bbf82a907c    5ef4c7c2-c006-49ea-88e9-9f40a0da6ce6    0d949807-5d44-4fc8-8753-1ce81f4a5d67    False   True    Prairie [Private School, WIND POINT, WI, PK, K-12]  School  the-prairie-school-wind-point-wi    41  [{'config': {'format': ['comma'], 'rounding': ...   Alum    The teachers are awesome and so approachable! ...   [Overall Experience]    2020-06-23T03:25:59.897153Z 2d7de44a-38a7-493c-ac87-a024ba85d42d    5.0 [{'description': 'Based on quality of academic...   NaN NaN NaN NaN NaN https://d13b2ieg84qqce.cloudfront.net/608f2378...   a69ad3c5-f274-4bbe-ab1b-f1977c79c6f9    UNLICENSED  2022-02-10T20:38:50.869965Z © Mapbox    https://api.mapbox.com/styles/v1/niche-admin/c...   https://api.mapbox.com/styles/v1/niche-admin/c...   86b0f616-f1bf-4123-a2bc-f93255053083    © OpenStreetMap http://www.openstreetmap.org/copyright  https://www.mapbox.com/about/maps/  https://d13b2ieg84qqce.cloudfront.net/a26a05f2...   https://d13b2ieg84qqce.cloudfront.net/a26a05f2...   ee7758c3-09ee-4d81-b4bd-c7f91c98652a    UNLICENSED  NaN 4.642857    70  [{'label': 'Virtual Tour', 'value': 'https://w...   501918  NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
95  8365632c-160e-4beb-b75c-dfafca1c2441    [{'label': 'View Nearby Homes', 'type': 'realE...   Best Public Middle Schools in Wisconsin 24  594 best-public-middle-schools/s/wisconsin  https://d33a4decm84gsn.cloudfront.net/search/2...   https://d33a4decm84gsn.cloudfront.net/search/2...   44.891018   -87.290723  Sevastopol Middle School    551350000496    Public  False   True    Public School   8365632c-160e-4beb-b75c-dfafca1c2441    False   False   STURGEON BAY, WI    Sevastopol Middle School    caaa657e-9c5e-4740-b72f-bef5b2c75ac1    NaN 963a1085-efe7-45f5-81ee-d2bbf82a907c    65ab2591-75de-487d-8a82-bddd79e3d3bd    7ae55b50-154c-4e0e-aff7-ed2726f7ceb8    False   True    Sevastopol Middle School    [Sevastopol School District, WI, 6-8]   School  sevastopol-middle-school-sturgeon-bay-wi    45  [{'config': {'format': ['comma'], 'rounding': ...   NaN NaN NaN NaN NaN NaN [{'description': 'Based on quality of academic...   NaN NaN NaN NaN NaN NaN NaN NaN NaN © Mapbox    https://api.mapbox.com/styles/v1/niche-admin/c...   https://api.mapbox.com/styles/v1/niche-admin/c...   e750dc05-07ed-42b0-92e3-f24ab16f1b8b    © OpenStreetMap http://www.openstreetmap.org/copyright  https://www.mapbox.com/about/maps/  NaN NaN NaN NaN NaN 0.000000    0   [{'label': 'Virtual Tour'}] NaN NaN NaN d4d24c63-d104-44cd-ad3f-0ded85522583    d4d24c63-d104-44cd-ad3f-0ded85522583    NaN NaN NaN
96  d46a53a4-62f4-4086-9a53-4c6f78f54915    [{'label': 'View Nearby Homes', 'type': 'realE...   Best Public Elementary Schools in Wisconsin 36  1074    best-public-elementary-schools/s/wisconsin  https://d33a4decm84gsn.cloudfront.net/search/2...   https://d33a4decm84gsn.cloudfront.net/search/2...   42.935063   -88.405594  Prairie View Elementary School  551006001321    Public  True    True    Public School   d46a53a4-62f4-4086-9a53-4c6f78f54915    True    False   NORTH PRAIRIE, WI   Prairie View Elementary School  ba8709ae-856d-4583-83b7-4484b51ed4c2    3940b781-a9f6-4333-b607-6a6367e6af44    963a1085-efe7-45f5-81ee-d2bbf82a907c    1a756678-9c81-4d89-8620-604b8e10507c    8afa0d18-4b1f-4052-a09a-d9bfe3e67295    False   True    Prairie View Elementary School  [Mukwonago Area School District, WI, PK, K-6]   School  prairie-view-elementary-school-north-prairie-wi 45  [{'config': {'format': ['comma'], 'rounding': ...   NaN NaN NaN NaN NaN NaN [{'description': 'Based on quality of academic...   NaN NaN NaN NaN NaN NaN NaN NaN NaN © Mapbox    https://api.mapbox.com/styles/v1/niche-admin/c...   https://api.mapbox.com/styles/v1/niche-admin/c...   51a79bc3-f67e-4b49-87b1-d36ef46e1145    © OpenStreetMap http://www.openstreetmap.org/copyright  https://www.mapbox.com/about/maps/  NaN NaN NaN NaN NaN 0.000000    0   [{'label': 'Virtual Tour'}] NaN NaN NaN bda72d2a-3f49-4288-a9f2-d024898ca67b    bda72d2a-3f49-4288-a9f2-d024898ca67b    NaN NaN NaN
97  0ed32d0d-f062-4784-8dd4-57a9724209eb    [{'label': 'View Nearby Homes', 'type': 'realE...   Best Public Middle Schools in Wisconsin 25  594 best-public-middle-schools/s/wisconsin  https://d33a4decm84gsn.cloudfront.net/search/2...   https://d33a4decm84gsn.cloudfront.net/search/2...   43.624047   -87.786299  Oostburg Middle School  551107001464    Public  False   True    Public School   0ed32d0d-f062-4784-8dd4-57a9724209eb    False   False   OOSTBURG, WI    Oostburg Middle School  1db5c6d2-5b8f-44fa-87fc-f7471ee45443    NaN 963a1085-efe7-45f5-81ee-d2bbf82a907c    b44a651d-bcfc-4d95-a6ce-f00c0c42671e    d594fdef-3441-462c-93ad-981c8fd1f064    False   True    Oostburg Middle School  [Oostburg School District, WI, 6-8] School  oostburg-middle-school-oostburg-wi  45  [{'config': {'format': ['comma'], 'rounding': ...   Niche User  The middle school does a great job at preparin...   [Academics] 2015-02-12T14:29:22Z    0c61a295-b7c7-44e9-ab7a-64993190796f    5.0 [{'description': 'Based on quality of academic...   NaN NaN NaN NaN NaN NaN NaN NaN NaN © Mapbox    https://api.mapbox.com/styles/v1/niche-admin/c...   https://api.mapbox.com/styles/v1/niche-admin/c...   32fa503a-a377-44e8-bb10-f9a7fa0bb67c    © OpenStreetMap http://www.openstreetmap.org/copyright  https://www.mapbox.com/about/maps/  NaN NaN NaN NaN NaN 4.800000    10  [{'label': 'Virtual Tour'}] NaN NaN NaN 3478a622-503f-47d5-93a0-c3207124cdd4    3478a622-503f-47d5-93a0-c3207124cdd4    NaN NaN NaN
98  17b8ac12-d893-4af4-bf79-3aa06bef648a    [{'label': 'View Nearby Homes', 'type': 'realE...   Best Public High Schools in Wisconsin   24  496 best-public-high-schools/s/wisconsin    https://d33a4decm84gsn.cloudfront.net/search/2...   https://d33a4decm84gsn.cloudfront.net/search/2...   42.993816   -88.224033  WEPA    551578002688    Public, Charter True    True    Charter School  17b8ac12-d893-4af4-bf79-3aa06bef648a    True    False   WAUKESHA, WI    Waukesha Engineering Preparatory Academy    ba8709ae-856d-4583-83b7-4484b51ed4c2    3940b781-a9f6-4333-b607-6a6367e6af44    963a1085-efe7-45f5-81ee-d2bbf82a907c    5a94913e-87ac-4e4e-9b76-a2330bf1a635    b88c94da-24d3-4004-b43b-547d9da55e0d    False   True    Waukesha Engineering Preparatory Academy    [School District of Waukesha, WI, 9-12] School  waukesha-engineering-preparatory-academy-wauke...   52  [{'config': {'format': ['comma'], 'rounding': ...   Senior  The Academy is well equipped and staffed, and ...   [Overall Experience]    2021-10-13T21:07:42.049714Z 8d8241ae-fc49-4bab-a146-48d77f1e6391    4.0 [{'description': 'Based on quality of academic...   NaN NaN NaN NaN NaN NaN NaN NaN NaN © Mapbox    https://api.mapbox.com/styles/v1/niche-admin/c...   https://api.mapbox.com/styles/v1/niche-admin/c...   b1db421b-6290-4a08-a854-82dd9089e116    © OpenStreetMap http://www.openstreetmap.org/copyright  https://www.mapbox.com/about/maps/  NaN NaN NaN NaN NaN 3.681818    22  [{'label': 'Virtual Tour'}] 500331  NaN NaN a368f833-c451-45bb-a0f7-b656d02477f3    a368f833-c451-45bb-a0f7-b656d02477f3    NaN NaN NaN
99  c3b20454-cd71-45bb-ab33-0b3ea37527fb    [{'label': 'View Nearby Homes', 'type': 'realE...   Best Public Elementary Schools in Wisconsin 37  1074    best-public-elementary-schools/s/wisconsin  https://d33a4decm84gsn.cloudfront.net/search/2...   https://d33a4decm84gsn.cloudfront.net/search/2...   43.089194   -87.883770  Atwater Elementary School   551380001809    Public  True    True    Public School   c3b20454-cd71-45bb-ab33-0b3ea37527fb    True    False   SHOREWOOD, WI   Atwater Elementary School   8b295479-c31f-47a9-83b8-94b2100e2832    3940b781-a9f6-4333-b607-6a6367e6af44    963a1085-efe7-45f5-81ee-d2bbf82a907c    900b6b9c-206e-4c34-82a8-247fee552b49    542c1289-ad69-4fc3-afab-bf91c1a6110e    False   True    Atwater Elementary School   [Shorewood School District, WI, PK, K-6]    School  atwater-elementary-school-shorewood-wi  45  [{'config': {'format': ['comma'], 'rounding': ...   NaN NaN NaN NaN NaN NaN [{'description': 'Based on quality of academic...   NaN NaN NaN NaN NaN NaN NaN NaN NaN © Mapbox    https://api.mapbox.com/styles/v1/niche-admin/c...   https://api.mapbox.com/styles/v1/niche-admin/c...   b523d19c-fdd1-497b-bd6d-ab394cde0dbf    © OpenStreetMap http://www.openstreetmap.org/copyright  https://www.mapbox.com/about/maps/  NaN NaN NaN NaN NaN 0.000000    0   [{'label': 'Virtual Tour', 'value': 'https://w...   NaN NaN NaN 84c36616-1b72-4d85-998d-c9795aadb726    84c36616-1b72-4d85-998d-c9795aadb726    NaN NaN NaN
100 rows × 73 columns

​ You can get all data by adjusting the range (go for 123 for max records). Also, you may want to add some pause between requests, otherwise you'd be blocked. You can also use Scrapy, if you wish.

CodePudding user response:

You need to check carefully the HTML because you can find the url inside one div

import scrapy


class NicheschoolsSpider(scrapy.Spider):
    name = 'nicheschools'
    allowed_domains = ['www.niche.com']
    start_urls = ['https://www.niche.com/k12/search/best-schools/s/wisconsin/']

    def parse(self, response):
        school_links = response.xpath("//div[@class='card ']/a/@href").extract()

        for link in school_links:
            yield response.follow(url=link, callback=self.parse_schools)

    def parse_schools(self, response):
        name = response.xpath("//h1[@class='postcard__title postcard__title--claimed']/text()").extract_first()
        website = response.xpath("(//a[@class='profile__website__link']/@href)[1]").extract_first()
        address = response.xpath("(//address[@class='profile__address--compact']/text())[1]").extract_first()

        yield {
            'name': name,
            'link': response.url,
            'website': website,
            'address': address,
        }

Result on json

{'name': 'Brookfield Academy', 'link': 'https://www.niche.com/k12/brookfield-academy-brookfield-wi/', 'website': 'https://www.brookfieldacademy.org', 'address': '3462 N BROOKFIELD RD'}
{'name': 'Wisconsin Lutheran High School', 'link': 'https://www.niche.com/k12/wisconsin-lutheran-high-school-milwaukee-wi/', 'website': 'https://www.wlhs.org', 'address': '330 N GLENVIEW AVE'}
{'name': 'Homestead High School', 'link': 'https://www.niche.com/k12/homestead-high-school-mequon-wi/', 'website': 'http://www.mtsd.k12.wi.us/homestead/', 'address': '5000 W MEQUON RD'}
{'name': 'Brookfield Central High School', 'link': 'https://www.niche.com/k12/brookfield-central-high-school-brookfield-wi/', 'website': 'https://www.elmbrookschools.org/brookfield-central-high-school', 'address': '16900 W GEBHARDT RD'}
{'name': 'Shorewood High School', 'link': 'https://www.niche.com/k12/shorewood-high-school-shorewood-wi/', 'website': 'https://www.shorewood.k12.wi.us/apps/pages/shs', 'address': '1701 E CAPITOL DR'}
{'name': 'School District of Waukesha', 'link': 'https://www.niche.com/k12/d/school-district-of-waukesha-wi/', 'website': 'https://sdw.waukesha.k12.wi.us', 'address': '222 MAPLE AVE'}
{'name': 'Pilgrim Park Middle School', 'link': 'https://www.niche.com/k12/pilgrim-park-middle-school-elm-grove-wi/', 'website': 'http://www.elmbrookschools.org/', 'address': '1500 PILGRIM PKWY'}
{'name': 'Marquette University High School', 'link': 'https://www.niche.com/k12/marquette-university-high-school-milwaukee-wi/', 'website': 'https://www.muhs.edu/', 'address': '3401 W WISCONSIN AVE'}
...

If you are new on web scraping you need to be careful with over hitting the site because they could block you and then you need to solve a captcha solution for enter the site.

Also If you want to expand your knowledge there are clusters of web scraping like Estela where you can run your spiders and also create cronjobs for do it everyday.

  • Related