I am trying to login in website with bs4. Login is done but when I try to parse shop data it shows that I'm not login.
import requests
from bs4 import BeautifulSoup
with requests.session() as c:
link="https://www.tais-shoes.ru/wp-login.php"
initial=c.get(link)
login_data = {"log": "*****","pwd": "*****",
"rememberme": "forever",
"redirect_to": "https://www.tais-shoes.ru/my-account/",
"redirect_to_automatic": "1"
}
page_login = c.post('https://www.tais-shoes.ru/wp-login.php', data=login_data)
print(page_login)
shop_url = "https://www.tais-shoes.ru/shop/"
html = requests.get(shop_url)
soup = BeautifulSoup(html.text, 'html.parser')
print(soup)
CodePudding user response:
You should be using the instance of the request.Session
you've created, but down below in your code you create a new connection with requests.get
.
Change this
html = request.get(shop_url)
To this:
html = c.get(shop_url)
Full code:
import requests
from bs4 import BeautifulSoup
with requests.Session() as c:
link="https://www.tais-shoes.ru/wp-login.php"
initial=c.get(link)
login_data = {"log": "*****","pwd": "*****",
"rememberme": "forever",
"redirect_to": "https://www.tais-shoes.ru/my-account/",
"redirect_to_automatic": "1"
}
page_login = c.post('https://www.tais-shoes.ru/wp-login.php', data=login_data)
print(page_login)
shop_url = "https://www.tais-shoes.ru/shop/"
html = c.get(shop_url)
soup = BeautifulSoup(html.text, 'html.parser')
print(soup)
CodePudding user response:
Did you check the robots.txt file on the sight ( https://www.tais-shoes.ru/robots.txt ). It seems to me that robots are disallowed on the entyre site. Scrapeing is strictly regulated with the list of allowencies and disallowencies stated in robots.txt. More about legality of it here https://www.tutorialspoint.com/python_web_scraping/legality_of_python_web_scraping.htm