Home > other >  Grass birds learn the crawler, simulation, after login ZOL how all can't obtain web page code,
Grass birds learn the crawler, simulation, after login ZOL how all can't obtain web page code,

Time:10-01

 
#! The/usr/bin/python
# - * - coding: utf-8 - * -
The import requests, json
The import time, the random
The import SSL
The from HTTP import cookiejar
The from bs4 import BeautifulSoup

My_headers=[
'the Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36 ',
'the Mozilla/5.0 (Macintosh; Intel Mac OS X 10 _9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36 ',
'the Mozilla/5.0 (Windows NT 6.1; Win64. X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36 '
]



Def LoginZOL (username, password) :
Agent=the random choice (my_headers)
Headers={
'Host' : 'service.zol.com.cn',
'Referer:' http://service.zol.com.cn/user/login.php ',
'Accept' : 'application/json, text/javascript, */*; Q=0.01 ',
'the Accept - Encoding' : 'gzip, deflate,
'the content-type' : 'application/x - WWW - form - urlencoded, text/HTML. Charset=utf-8 ',
'Connection' : 'keep - the alive',
'cookies' :' ip_ck=7 sgf7vnyj7quoty0otm1lje1nzg1mzk0ntk % 3 d; Z_pro_city=3 dhunan s_provice % % 26 s_city % 3 dchangsha; Last_userid=4 u3257; Zol_bind_4u3257=1; Z_day izol107429%3=d4. BAIDU_SSP_lcr=http://zol.iqiyi.com.cn/? C=Api_Login & amp; A=APILogin& Act=signin& The username=4 u3257 & amp; Check=b4f8e7b429f9bd6263322be7d9e9f7d3 & amp; backUrl=http://bbs.zol.com.cn/; Zol_userid=4 u3257; Zol_check=164255255; Zol_sid=55347437; Zol_cipher=4 ebc37d40e7e19a566c57b0cea2c51a4; Hm_lvt_ae5edc2bc4fc71370807f6187f0a2dd0=1578619489157619, 744157619, 766157619, 785; Adshow=0; Questionnaire_pv=1578614433; Lv=1578623912; .vn=4; Hm_lpvt_ae5edc2bc4fc71370807f6187f0a2dd0=1578623913 ',
'the user-agent: Agent
}
Data={
https://bbs.csdn.net/topics/'userid' : the username,
'the PWD: password,
'is_auto' : '1',
'the act' : 'signin,
'check' : 'b4f8e7b429f9bd6263322be7d9e9f7d3',
'backUrl' : 'http://service.zol.com.cn/user/login.php'
}
Login_url='http://service.zol.com.cn/user/ajax/login2014/login.php'


Ssion=requests. The session ()
Ssion. Cookies=cookiejar. LWPCookieJar (filename='cookies')
Try:
Print (ssion. Cookies)
Ssion. Cookies. The load (ignore_discard=True)

Except:
Print (" load cookies failed ")

Time. Sleep (0.1)
Res=ssion. Post (login_url data=https://bbs.csdn.net/topics/data, headers=headers)

Print (res) text)
Login_code=res. Json ()

Time. Sleep (0.1)
Resg=ssion. Get (login_code [' ext '])
Print (resg. Text)
Print (ssion. Cookies)
Time. Sleep (0.1)
Resg=ssion. Get (' * * */Settings/http://my.zol.com.cn/4u3 ')
# resg=ssion. Get (' http://my.zol.com.cn/4u3 * * */Settings/' headers=headers)
Resg. Encoding='GB2312'
Code=resg. Text

With the open (" code. TXT ", "w +", encoding="utf-8") as f:
F.w rite (code)

If __name__=="__main__ ':
LoginZOL (' 4 u3 * * * ', 'c10ce251b9893b8d * * * * * * * * * * * * * * * *')


I check the information on the Internet, all in accordance with the instructions, but always can't get in the 'http://my.zol.com.cn/4u3 * * */Settings/the content of the
Either return


Either return


Beg ace master high master advice. Thanks

CodePudding user response:

Only oneself of sofa

CodePudding user response:

On Monday, have a good intention to work to help novice

CodePudding user response:

Read the code, know principle is one of the most important, information has timeliness, not necessarily correct, especially the crawler this,
See the error message should be the problem of cookies
This way, I teach you to troubleshoot problems:
1. You to log in first, and then in the debugger chorme cookies to pick out;
2. Then open the postman, his stick cookies to the Header, and see if I can come out;

If can open, reverse the crawler strategy target sites in general, is the code inside the small problems, oneself go to counter check code in which the cookie is lost

CodePudding user response:

The
reference 3 floor nieoding response:
read the code, know principle is one of the most important, information has timeliness, not necessarily correct, especially the crawler this,
See the error message should be the problem of cookies
This way, I teach you to troubleshoot problems:
1. You to log in first, and then in the debugger chorme cookies to pick out;
2. Then open the postman, his stick cookies to the Header, and see if I can come out;

If can open, reverse the crawler strategy target sites in general, is the code inside the small problems, oneself go to counter check code in which the cookie is lost


Thank you
I look at the front res=ssion. Post (login_url data=https://bbs.csdn.net/topics/data, headers=headers) when the
The return value of res. The text is
{" info ", "ok", "MSG" : "https:\/\/login.zol.com \/index. PHP? C=Default& A=APILogin& Act=signin& The username=4 u3257 & amp; Check=b4f8e7b429f9bd * * * * * * * 2 be7d9e9f7d3 & amp; T=1578901505 ", "ext http:\/\/zol.iqiyi.com.cn\/?" :" C=Api_Login & amp; A=APILogin& Act=signin& The username=4 u3257 & amp; Check=b4f8e7b429f9bd * * * * * * 2 be7d9e9f7d3 & amp; BackUrl=http:\/\/service.zol.com.cn\/user\/login.php}
"On behalf of the login successful? nullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnull