Home > database >  How to troubleshoot Scrapy shell response 403 error
How to troubleshoot Scrapy shell response 403 error

Time:07-04

A few months ago I followed this Scrapy shell method to scrape a real estate listings webpage and it worked perfectly.

I pulled my cookie and user-agent text from Firefox (Developer tools -> Headers) when the target URL is loaded, and I would get a successful response (200) and be able to pull items from response.xpath.

For example:

url = 'https://www.realtor.com/realestateandhomes-search/McLean_VA/type-single-family-home/pg-1?pos=39.126499,-77.43902,38.685678,-76.779841,11&qdm=true'
cookie = '__fp=7387663eca6ba5161d1c58711dd65164; split=n; split_tcv=105; __vst=742f3db3-c514-4032-8650-21d4ccfdd85f; __ssn=a0587b1b-bc15-4e3d-8738-3fc8757071ab; __ssnstarttime=1656813474; criteria=pg=1&sprefix=%2Frealestateandhomes-search&typ=1&area_type=city&search_type=city&city=McLean&state_code=VA&state_id=VA&lat=38.9435449&long=-77.1929134&county_fips=51059&county_fips_multi=51059&loc=McLean%2C%20VA&locSlug=McLean_VA&county_needed_for_uniq=false&p…; _gid=GA1.2.260165497.1656813481; AMCV_AMCV_8853394255142B6A0A4C98A4@AdobeOrg=-1124106680|MCMID|79412848632605408421861417717111497169|MCIDTS|19177|MCOPTOUT-1656820680s|NONE|vVersion|5.2.0; _fbp=fb.1.1656813480720.1479123934; AMCVS_AMCV_8853394255142B6A0A4C98A4@AdobeOrg=1; adcloud={"_les_v":"y,realtor.com,1656815313"}; _clck=d7aq33|1|f2u|0; _clsk=sechzn|1656871412641|1|0|n.clarity.ms/collect; _uetsid=c56bc620fa7111ecb58a6f841dcc81b4; _uetvid=c56bcb70fa7111eca88d1bd5d241568e'
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Firefox/102.0'

(fetch(scrapy.Request(url=url, headers={'cookie': cookie, 'user-agent': user_agent})), response)
listings = json.loads(response.xpath('/html/body/script[1]/text()').getall()[0])['props']['pageProps']['searchResults']['home_search']['results']

Now I'm trying again a few months later (with an updated cookie) and I'm getting a 403 error -- the server understands the request but refuses to authorize it:

In [7]: (fetch(scrapy.Request(url=url, headers={'cookie': cookie, 'user-agent':
   ...: user_agent})), response)
2022-07-03 14:14:43 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://www.realtor.com/realestateandhomes-search/McLean_VA/type-single-family-home?pos=39.069149,-77.355927,38.742653,-76.862935,11&qdm=true&view=map> (referer: None)
Out[7]: (None, None)

Any thoughts on what I might try to get this working again? Thanks.

CodePudding user response:

The cookie is not what's causing the problem. (see below) I think the issue here is that with 'view=map', its looking for a 'referer' key in the header dict (in addition to other header keys). I would suggest adding a key/pair of 'referer':"url" in your headers. Alternatively you can try less heavy approach:

import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:101.0) Gecko/20100101 Firefox/101.0',
    'Accept': 'text/html,application/xhtml xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
    'Accept-Language': 'en-US,en;q=0.5',
    'DNT': '1',
    'Upgrade-Insecure-Requests': '1',
    'Sec-Fetch-Dest': 'document',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-Site': 'none',
    'Sec-Fetch-User': '?1',
    'Connection': 'keep-alive',
    'If-None-Match': '"f0267-ybK8wNq/yADu0m5N1CYhPqrXfaY"',
}

response = requests.get('https://www.realtor.com/realestateandhomes-search/McLean_VA/type-single-family-home/pg-1?pos=39.126499,-77.43902,38.685678,-76.779841,11&qdm=true', headers=headers)

sp = BeautifulSoup(response.text,'lxml')

results = sp.find_all('li',{'data-testid':'result-card'})

print(results[0])

output:

<li  data-testid="result-card"><div ><img alt=""  src="data:image/svg xml;base64,PD94bWwgdmVyc2lvbj0iMS4wIiBzdGFuZGFsb25lPSJubyI/Pgo8IURPQ1RZUEUgc3ZnIFBVQkxJQyAiLS8vVzNDLy9EVEQgU1ZHIDIwMDEwOTA0Ly9FTiIKICJodHRwOi8vd3d3LnczLm9yZy9UUi8yMDAxL1JFQy1TVkctMjAwMTA5MDQvRFREL3N2ZzEwLmR0ZCI CjxzdmcgdmVyc2lvbj0iMS4wIiB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciCiB3aWR0aD0iNTEuMDAwMDAwcHQiIGhlaWdodD0iNTEuMDAwMDAwcHQiIHZpZXdCb3g9IjAgMCA1MS4wMDAwMDAgNTEuMDAwMDAwIgogcHJlc2VydmVBc3BlY3RSYXRpbz0ieE1pZFlNaWQgbWVldCI Cgo8ZyB0cmFuc2Zvcm09InRyYW5zbGF0ZSgwLjAwMDAwMCw1MS4wMDAwMDApIHNjYWxlKDAuMTAwMDAwLC0wLjEwMDAwMCkiCmZpbGw9IiMwMDAwMDAiIHN0cm9rZT0ibm9uZSI CjwvZz4KPC9zdmc Cg=="/><div ><div ><div ><span >Brokered by<!-- --> </span><span  data-label="pc-brokered">Maram Realty, LLC</span></div></div></div><div  data-id="6566177482" data-label="property-card" data-testid="property-card"><div  data-testid="pc-photo-wrap" id="6566177482"><a aria-label="Navigate to 2533 Flint Hill Rd Listing Detail Page"  data-testid="property-anchor" href="/realestateandhomes-detail/2533-Flint-Hill-Rd_Vienna_VA_22181_M65661-77482" rel="noopener" target="_self"><picture  data-lazy="force-loaded"><source data-testid="img-webp" height="100%" srcset="https://ap.rdcpix.com/c2319074726c472738b73fd84b91a521l-m3065887895od-w480_h360_x2.webp, https://ap.rdcpix.com/c2319074726c472738b73fd84b91a521l-m3065887895od-w480_h360_x2.webp 2x" type="image/webp" width="100%"/><img alt="2533 Flint Hill Rd, Vienna, VA 22181"  data-atf="true" data-fmp="true" data-label="pc-photo" data-src="https://ap.rdcpix.com/c2319074726c472738b73fd84b91a521l-m3065887895od-w480_h360_x2.jpg" height="100%" itemprop="image" src="https://ap.rdcpix.com/c2319074726c472738b73fd84b91a521l-m3065887895od-w480_h360_x2.jpg" srcset="https://ap.rdcpix.com/c2319074726c472738b73fd84b91a521l-m3065887895od-w480_h360_x2.jpg, https://ap.rdcpix.com/c2319074726c472738b73fd84b91a521l-m3065887895od-w480_h360_x2.jpg 2x" width="100%"/></picture></a><div ><button  data-testid="save-button" type="button"></button></div></div><div ><div ><div ><span  data-label="pc-new"><span>New - 4 hours ago</span></span></div></div></div><div ><div ></div></div><div  data-testid="property-detail"><div ><div  data-testid="forsale"><span ></span><span >For Sale</span></div></div><div ><div ><div  data-label="pc-price-wrapper"><span  data-label="pc-price">$1,025,000</span></div><div ><div  data-testid="property-meta-container"><ul ><li  data-label="pc-meta-beds"><span  data-label="meta-value">3</span><span  data-label="meta-label">bed</span></li><li  data-label="pc-meta-baths"><span  data-label="meta-value">2.5</span><span  data-label="meta-label">bath</span></li><li  data-label="pc-meta-sqft"><span  data-label="meta-value">1,221</span><span  data-label="meta-label">sqft</span></li><li  data-label="pc-meta-sqftlot"><span  data-label="meta-value">0.52</span><span  data-label="meta-label">acre lot</span></li></ul></div></div><div ><div  data-label="pc-address">2533 Flint Hill Rd<!-- -->, <div  data-label="pc-address-second">Vienna<!-- -->, VA<!-- --> <!-- -->22181</div></div><div ><button aria-label="Email agent for 2533 Flint Hill Rd, Vienna, VA 22181"  data-testid="cta-button" data-toggle="modal" type="button">Email agent</button></div></div></div></div></div></div></div></li>```
  • Related