Home > Software design >  Session cookies not passing properly while scraping
Session cookies not passing properly while scraping

Time:12-08

I am trying to scrape Walmart, if I just dump the curl Payload in requests format, it works fine, the issue is cookies, if I omit, it gives a 403 error. I do not want to pass static cookies, I tried to pass session cookies but not working. Below is my code

Passing Static Cookies

import requests
headers = {
        'authority': 'www.walmart.com',
        'pragma': 'no-cache',
        'cache-control': 'no-cache',
        'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"',
        'dnt': '1',
        'sec-ch-ua-mobile': '?0',
        'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.55 Safari/537.36',
        'sec-ch-ua-platform': '"macOS"',
        'content-type': 'application/json',
        'accept': '*/*',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'cors',
        'sec-fetch-dest': 'empty',
        'referer': 'https://www.walmart.com/store/5939-bellevue-wa/search?query=butter',
        'accept-language': 'en-US,en;q=0.9,ur;q=0.8,zh-CN;q=0.7,zh;q=0.6',
         'cookie': 'brwsr=3546b2d8-4454-11ec-8b5a-dbae802bd5ca; ACID=3e3752b6-a706-4ddf-95a2-ae88083ebe3e; hasACID=true; locGuestData=eyJpbnRlbnQiOiJTSElQUElORyIsInN0b3JlSW50ZW50IjoiUElDS1VQIiwibWVyZ2VGbGFnIjpmYWxzZSwicGlja3VwIjp7Im5vZGVJZCI6IjMwODEiLCJ0aW1lc3RhbXAiOjE2MzY3ODg5MDEyMTB9LCJwb3N0YWxDb2RlIjp7InRpbWVzdGFtcCI6MTYzNjc4ODkwMTIxMCwiYmFzZSI6Ijk1ODI5In0sInZhbGlkYXRlS2V5IjoicHJvZDp2MjozZTM3NTJiNi1hNzA2LTRkZGYtOTVhMi1hZTg4MDgzZWJlM2UifQ==; vtc=T0u0KwKFJGHeU3asnYGQ_w; TBV=7; DL=94066,,,ip,94066,,; TB_Latency_Tracker_100=1; TB_Navigation_Preload_01=1; crumb=2RJK-XnGcnZ8WeLbzMNhC6uSY75Q9sqHRkR8eVTWjyH; tb_sw_supported=true; TB_SFOU-100=1; AID=wmlspartner%3Dimp_150372%3Areflectorid%3Dimp_zl3TTEwhgxyIUNGVPPU0LViWUkGxlaWkqz2KVY0%3Alastupd%3D1638813128173; locDataV3=eyJpbnRlbnQiOiJTSElQUElORyIsInBpY2t1cCI6W3siYnVJZCI6IjAiLCJub2RlSWQiOiIzMDgxIiwiZGlzcGxheU5hbWUiOiJTYWNyYW1lbnRvIFN1cGVyY2VudGVyIiwibm9kZVR5cGUiOiJTVE9SRSIsImFkZHJlc3MiOnsicG9zdGFsQ29kZSI6Ijk1ODI5IiwiYWRkcmVzc0xpbmUxIjoiODkxNSBHZXJiZXIgUm9hZCIsImNpdHkiOiJTYWNyYW1lbnRvIiwic3RhdGUiOiJDQSIsImNvdW50cnkiOiJVUyIsInBvc3RhbENvZGU5IjoiOTU4MjktMDAwMCJ9LCJnZW9Qb2ludCI6eyJsYXRpdHVkZSI6MzguNDgyNjc3LCJsb25naXR1ZGUiOi0xMjEuMzY5MDI2fSwiaXNHbGFzc0VuYWJsZWQiOnRydWUsInNjaGVkdWxlZEVuYWJsZWQiOnRydWUsInVuU2NoZWR1bGVkRW5hYmxlZCI6dHJ1ZX1dLCJkZWxpdmVyeSI6eyJidUlkIjoiMCIsIm5vZGVJZCI6IjMwODEiLCJkaXNwbGF5TmFtZSI6IlNhY3JhbWVudG8gU3VwZXJjZW50ZXIiLCJub2RlVHlwZSI6IlNUT1JFIiwiYWRkcmVzcyI6eyJwb3N0YWxDb2RlIjoiOTU4MjkiLCJhZGRyZXNzTGluZTEiOiI4OTE1IEdlcmJlciBSb2FkIiwiY2l0eSI6IlNhY3JhbWVudG8iLCJzdGF0ZSI6IkNBIiwiY291bnRyeSI6IlVTIiwicG9zdGFsQ29kZTkiOiI5NTgyOS0wMDAwIn0sImdlb1BvaW50Ijp7ImxhdGl0dWRlIjozOC40ODI2NzcsImxvbmdpdHVkZSI6LTEyMS4zNjkwMjZ9LCJpc0dsYXNzRW5hYmxlZCI6dHJ1ZSwic2NoZWR1bGVkRW5hYmxlZCI6dHJ1ZSwidW5TY2hlZHVsZWRFbmFibGVkIjp0cnVlLCJhY2Nlc3NQb2ludHMiOlt7ImFjY2Vzc1R5cGUiOiJERUxJVkVSWV9BRERSRVNTIn1dfSwic2hpcHBpbmdBZGRyZXNzIjp7ImxhdGl0dWRlIjozOC40NzM4LCJsb25naXR1ZGUiOi0xMjEuMzQzOSwicG9zdGFsQ29kZSI6Ijk1ODI5IiwiY2l0eSI6IlNhY3JhbWVudG8iLCJzdGF0ZSI6IkNBIiwiY291bnRyeUNvZGUiOiJVU0EiLCJnaWZ0QWRkcmVzcyI6ZmFsc2V9LCJhc3NvcnRtZW50Ijp7Im5vZGVJZCI6IjMwODEiLCJkaXNwbGF5TmFtZSI6IlNhY3JhbWVudG8gU3VwZXJjZW50ZXIiLCJhY2Nlc3NQb2ludHMiOm51bGwsImludGVudCI6IlBJQ0tVUCIsInNjaGVkdWxlRW5hYmxlZCI6ZmFsc2V9LCJpbnN0b3JlIjpmYWxzZSwicmVmcmVzaEF0IjoxNjM4ODM0NzI4MjkxLCJ2YWxpZGF0ZUtleSI6InByb2Q6djI6M2UzNzUyYjYtYTcwNi00ZGRmLTk1YTItYWU4ODA4M2ViZTNlIn0=; assortmentStoreId=3081; hasLocData=1; akavpau_p2=1638813728~id=0ddec06ac25050953645a4b5d72af4f0; adblocked=true; com.wm.reflector="reflectorid:imp_zl3TTEwhgxyIUNGVPPU0LViWUkGxlaWkqz2KVY0@lastupd:1638813132000@firstcreate:1636788901158"; next-day=null|true|true|null|1638862621; location-data=94066:San Bruno:CA::0:0|21k;;15.22,46y;;16.96,1kf;;19.87,1rc;;23.22,46q;;25.3,2nz;;25.4,2b1;;27.7,4bu;;28.38,2er;;29.12,1o1;;30.14|2|7|1|1xun;16;0;2.44,1xtf;16;1;4.42,1xwj;16;2;7.04,1ygu;16;3;8.47,1xwq;16;4;9.21; TB_DC_Flap_Test=0; bstc=cqPOkDdcHRbOOWnXljFUXU; mobileweb=0; xpa=; xpm=3+1638862621+T0u0KwKFJGHeU3asnYGQ_w~+0; _pxhd=1wGn8vPC/xYO43oIyZQIBmYn28J4J4/ceMd-WLk8e9M7Qyw0ToljNcm1zXiFNAC5tYcUXy88tg3nBqpdAuT5sQ==:dW/JtmJPPeYviWr1RnbGL8Q8fvsAnk/Tr319tYqmCEF-ZACrf9lbi8vQzvzKY6lDVRH5k5dW2zRPEnPwSFdhe9v-Q1NOMHusjxvPvQ2BFeM=; ak_bmsc=DEAF5D13B7E5DB9D1445A42C981F72DB~000000000000000000000000000000~YAAQP54QApQE/i99AQAATg3Tkw6GGZisOY9WqflSzzYhEOPbRvefmAkpiEYorVwUfg9UEAFnLY8StjWnjYCXsEqzFDDy gnGpbydgZS 20l VJJigKU51o2xuF8AJrgY5QzJhLMA/i9MxW5MRL n0zV BC1PLTb1hhelYx8GmmyDV HifGBSgErgNtb6pUA5ydONX9EprYpZknQfqP30OmVWTnKkloTrQDkfJcJ0vI P/MEZb8U2molWz/GdGwU rbhcdSfa9oWaLaAp8E/DZjsY6YpmW4fmZNQTrUfa7P2db8EDnbWF1zi8r3D 51o2Hi7bd5Bi 8txdaDdl9YT3 Nr4VofqwkH5iXqOmidTDd/fO6qvtQyO0oau3Vjowa xhn04TYYtlQjsb9A; xptwg=3271574160:209D26905C6B780:55789D4:D6539E2E:8EBD026C:E7FE99FF:; TS01b0be75=01538efd7c206a0dc58225169e0b825676e9c5e453ca7af568232e9be3e84e5a6227c4b728ed7713e869287e81ef2d29326fcd6cab; TS013ed49a=01538efd7c206a0dc58225169e0b825676e9c5e453ca7af568232e9be3e84e5a6227c4b728ed7713e869287e81ef2d29326fcd6cab; bm_mi=01610934A8DDD8546E2A0DEF61A8C91C~xOaPkjLqDDIuv11EPeZ1gIxvBQt6PD6zjA7Gd2qVQfVC6cwGG83ojqAvWCSM6IYwVuSjWY3iSyJNH7YzUX2nmRtIlmJUlrAz5tz3OU9v1zqnhHdq0QcCuMve0SUoKRpLcqWK65ocd9vpQR78SbT7CLWkBDckK4ro0g38t1cdBsBb8OZKF21D M5ZU1pHVEuO3MvvWCWNDByoMpg9KcRVvq/67Rbu6fVBemqGJU3g54VcLF6BgDysyAiM1zrC5dHyOznoyGKalanQZpuh4hQQwg==; bm_sv=FA70A3C0284440D78F7D26E1C9316BD3~G6MyGle3ACdP7loWuTml7iej WW4evzORtt1IKVCT/tduzxQYcbTI2Ti1xyqDnz8l6K0ty4wXsRVwrzTkcnDAzku7y4AVIZ5cKsSt9rThXIHazknmtE9Y0OSWFRz8IdFFrO5uHnXxAPnXTdp97vj7fYje/wT3m GfZhmesPZZNs=',
    }
    params = (
        ('query', keyword),
        ('stores', '5939'),
        ('cat_id', '0'),
        ('ps', '24'),
        ('offset', '24'),
        ('prg', 'desktop'),
        ('zipcode', '98006'),
        ('stateOrProvinceCode', 'WA'),
    )
    session = requests.session()
r = session.get(
        'https://www.walmart.com/store/electrode/api/search',
        headers=headers,
        params=params
    )
    print(r.status_code) #returns 200

Passing runtime cookies(It does not work)

import requests
headers = {
        'authority': 'www.walmart.com',
        'pragma': 'no-cache',
        'cache-control': 'no-cache',
        'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"',
        'dnt': '1',
        'sec-ch-ua-mobile': '?0',
        'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.55 Safari/537.36',
        'sec-ch-ua-platform': '"macOS"',
        'content-type': 'application/json',
        'accept': '*/*',
        'sec-fetch-site': 'same-origin',
        'sec-fetch-mode': 'cors',
        'sec-fetch-dest': 'empty',
        'referer': 'https://www.walmart.com/store/5939-bellevue-wa/search?query=tillamook',
        'accept-language': 'en-US,en;q=0.9,ur;q=0.8,zh-CN;q=0.7,zh;q=0.6'         
    }
 params = (
        ('query', keyword),
        ('stores', '5939'),
        ('cat_id', '0'),
        ('ps', '24'),
        ('offset', '24'),
        ('prg', 'desktop'),
        ('zipcode', '98006'),
        ('stateOrProvinceCode', 'WA'),
    )
# Get session cookies
    session = requests.session()
    r = session.get('https://www.walmart.com/')
    cookies = session.cookies
    cookies_dictionary = cookies.get_dict()
    print(cookies)
    r = session.get(
        'https://www.walmart.com/store/electrode/api/search',
        headers=headers,
        params=params,
        cookies=cookies # Not working
    )
    print(r.status_code) # Returns 403

Update response.cookies generate different cookies

{'TS01b0be75': '01538efd7c044ddd7186ec7b8e9a2c7c3f59c4292535357d3ee83514b723af242e66b7d2d4bb89dca3c3512d765a35854d24b328b7', '_pxhd': 'f2ledR7-gSvtTHbIFTwB89DoCT99Whwim/4tbcl-XKEvqQH1mfG94waAxWYDdlEtAS7YXmgpcH9Wg7XztyrF-w==:DGocCdAUsAwd9NAEPkHiaYyPxwgpVX3GTvmhGfztbMb2T6hqgKWOgSYeW1U0qtgwsseesEz/fHMPnVceRuFkILyXsV9wDt9vKy708YcyTOk=', 'akavpau_p2': '1638865229~id=bd97f18a4cb5fae6b4f4625ef3799a9d'}

CodePudding user response:

Try using selenium to fetch the cookies from that site first according to how @furas suggested in comments. You can then use those cookies within headers while issuing get requests to grab the required response and result. I found success using the following approach:

import time
import requests
from selenium import webdriver

def get_cookies():
    with webdriver.Chrome() as driver:
        driver.get('https://www.walmart.com/store/5939-bellevue-wa/search?query=tillamook')
        time.sleep(10)
        driver_cookies = driver.get_cookies()
        cookie = {c['name']:c['value'] for c in driver_cookies}
    return cookie

headers = {
    'accept': '*/*',
    'user-agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36',
    'referer': 'https://www.walmart.com/store/5939-bellevue-wa/search?query=tillamook', 
}

params = {
    'query': 'tillamook',
    'stores': '5939',
    'cat_id': '0',
    'ps': '24',
    'offset': '24',
    'prg': 'desktop',
    'zipcode': '98006',
    'stateOrProvinceCode': 'WA',
}

with requests.session() as session:
    r = session.get(
        'https://www.walmart.com/store/electrode/api/search',
        headers=headers,
        params=params,
        cookies=get_cookies()
    )
    print(r.status_code) # Returns 200
    print(r.json())
  • Related