Home > Mobile >  How to parse a cookie file in beautiful soup
How to parse a cookie file in beautiful soup

Time:10-28

My organization needs me to authenticate a two factor authentication to scrape an internal website. Every time when i open a browser it will ask for an authentication . The authentication cookie is stored in c://users//.way//cookie.bat . I want to use this cookie file to scrape an internal website . can some one help me in this?

sample program

from bs4 import BeautifulSoup
import requests
header={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}
cookie=c://users//.way//cookie.bat # cookie variable should read the contents in the cookie file and pass it      in requests
source=requests.get('https://www.internalwebsite.com',headers=header,cookie=cookies)
soup=BeautifulSoup(source,'lxml')

### general scraping

I tried reading the cookie file but i am unable to do that. kindly help me in reading the cookie file and pass it in requests so that i can access internal website through BeautifulSoup

CodePudding user response:

BeautifulSoup will not handle cookies, instead it's requests job. Automatically parsing cookies from a file and adding them to your request session is going to be a bit a complicated but the general idea would be:

  • read the contents of the file with open
  • parse the file to a python dict (this depends on the format of your cookies file)
  • create a request session with your cookies.
  • use the session to get the website.

It might be easier to just hardcode the authentication cookies in your code ( session.auth = ('user', 'pass') ) if you don't need to update them too often.

CodePudding user response:

Cookie is just a ";" separated string in key=value format. You can use Python's built-in SimpleCookie.

from Cookie import SimpleCookie

cookie = SimpleCookie(<cookie.bat-contents>)
cookies = {k: v.value for k, v in cookie.iteritems()}
requests.get(url, cookies=cookies)
  • Related