import requests
The import json
The import pprint
The import openpyxl
Headers={' nonce ':' 145 f03 d3-4-8 c54 bf187-02 - c4beb8cc0d99 ',
'origin' : 'https://bihu.com',
'referer:' https://bihu.com/people/1741228189 ',
'the SEC - fetch - dest' : 'empty'
'the SEC - fetch - mode:' cors,
'the SEC - fetch - site' : 'the same - site,
'signature' : '87 d67b83e883d2f46a8bdcf7edd332ade97adac098488b98c4d1a82cbd8291e3',
'timestamp', '1590841113883',
'the user-agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64. X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4128.3 Safari/537.36 ',
5 d9cc323be36739c6a233324e45a1a3c 'uuid' : ' ',
'version' : '2.16.0',
}
Def gethtml () :
HTML=requests. Get (url, headers=headers, verify=False). The text
data=https://bbs.csdn.net/topics/json.loads (HTML)
Datas=data (' data ')
Lists=datas (' data ')
The content=list ()
For shuju lists in:
Content. append (shuju [' content '])
# pprint. Pprint (content)
For the name in the content:
Pprint. Pprint (name [' title '])
Pprint. Pprint (name [' income '])
If __name__=="__main__ ':
For I in range (1, 9) :
Url="https://gw.bihu.com/api/content/author/1741228189/list? Type=ARTICLE& PageNum='+ STR (I)
Print (gethtml ())
"" "the last try library
The file=openpyxl. Workbook ()
Sheet=file. The active
Sheet. The title='currency on crawler'
Sheet [' A1]='title'
Sheet [' B1]='earnings'
For I in:
Print (I)
Sheet. Append (I)
File. The save (' currency by climbing 123. XLSX)
"" "
Is very simple to extract the title and number, but I am stuck in saving data here, because my English is still in the level of the second grade, some words are not familiar with logic was blocked, but also not line, you don't write optical wrote without, only to consult everybody predecessors,
Also recommend programmer communication channels, their study is hard!!!!!!!!!!
CodePudding user response:
thank you, best can tell me the logic