Home > Enterprise >  Scrape site with login details with bs4 python
Scrape site with login details with bs4 python

Time:06-20

I am trying to login in website with bs4. Login is done but when I try to parse shop data it shows that I'm not login.

import requests
from bs4 import BeautifulSoup

with requests.session() as c: 
    
    link="https://www.tais-shoes.ru/wp-login.php" 
    initial=c.get(link) 

    login_data = {"log": "*****","pwd": "*****", 
              "rememberme": "forever", 
              "redirect_to": "https://www.tais-shoes.ru/my-account/", 
              "redirect_to_automatic": "1"
             }

    page_login = c.post('https://www.tais-shoes.ru/wp-login.php', data=login_data)
    
    print(page_login) 
    
    shop_url = "https://www.tais-shoes.ru/shop/"
    html = requests.get(shop_url)
    soup = BeautifulSoup(html.text, 'html.parser')

    print(soup)

CodePudding user response:

You should be using the instance of the request.Session you've created, but down below in your code you create a new connection with requests.get.

Change this

    html = request.get(shop_url)

To this:

    html = c.get(shop_url)

Full code:

import requests
from bs4 import BeautifulSoup

with requests.Session() as c: 
    
    link="https://www.tais-shoes.ru/wp-login.php" 
    initial=c.get(link) 

    login_data = {"log": "*****","pwd": "*****", 
              "rememberme": "forever", 
              "redirect_to": "https://www.tais-shoes.ru/my-account/", 
              "redirect_to_automatic": "1"
             }

    page_login = c.post('https://www.tais-shoes.ru/wp-login.php', data=login_data)
    
    print(page_login) 
    
    shop_url = "https://www.tais-shoes.ru/shop/"
    html = c.get(shop_url)
    soup = BeautifulSoup(html.text, 'html.parser')

    print(soup)

CodePudding user response:

Did you check the robots.txt file on the sight ( https://www.tais-shoes.ru/robots.txt ). It seems to me that robots are disallowed on the entyre site. Scrapeing is strictly regulated with the list of allowencies and disallowencies stated in robots.txt. More about legality of it here https://www.tutorialspoint.com/python_web_scraping/legality_of_python_web_scraping.htm

  • Related