See code:
# - * - coding: utf-8 - * -
"" "
Created on Wed Jul 29 15:41:14 2020
@ the author: yzhang
"" "
Import requests, time, the random, json, re
The from urllib. Parse the import urlencode
# import pymongo
The from requests. Exceptions import RequestException
The import CSV
The import codecs
The import XLWT
Def get_pages (since_id) :
Data={
https://bbs.csdn.net/topics/
'type' : 'the uid,
'value', '1702771281',
'containerid', '1076031702771281',
'the since_id: since_id
}
Base_url='https://m.weibo.cn/api/container/getIndex? '
Url=base_url + urlencode (data)
Result=requests. Get (url, headers=headers)
Try:
If the result. Status_code==200:
The response=requests. Get (url)
Res_dict=json. Loads (response. The text)
CARDS=res_dict [' data '] [' CARDS ']
For card in CARDS:
Text=card [' mblog] [' raw_text]
Like=card [' mblog] [' attitudes_count]
Comment=card [' mblog] [' comments_count]
Repost=card [' mblog] [' reposts_count]
Print (text)
Print (comment)
Print (repost)
Print (like)
Print (' - '* 50)
The write
# print (result. Json ()) # # # page of the return type is, in fact, the STR type, but it is very special, is the json format so if you want to directly
# # # parse returns as a result, get a dictionary format, you can directly call json ()
# CREAT TABLE weibo_test (id int primary key auto_increment, weibo_text text) DEFAULT CHARSET='utf8'
Except requests. ConnectionError as e:
Print (" Error ", e.a RGS)
Min_since_id='
Def get_since_id () :
Global min_since_id
Topic_url='https://m.weibo.cn/api/container/getIndex? Type=uid& value=https://bbs.csdn.net/topics/1702771281&containerid=1076031702771281 '
Topic_url=topic_url + '& amp; The since_id='+ STR (min_since_id)
# # print (json)
Result=requests. Get (topic_url, headers=headers)
Json=result. Json ()
# print (json)
The items=json. Get (' data '). The get (' cardlistInfo)
# print (items)
Min_since_id=items [' since_id]
Return min_since_id
Def the main () :
For I in range (10) :
Print (' {} on page. The format (I))
Print (get_since_id ())
Get_pages (get_since_id ())
# def save_to_mongodb (dict) :
#
# client=pymongo. MongoClient ()
# db=client/' weibo '
# collection=db/' weibo '
# # if collection. Insert_one (dict) : return ID value
Write data # print (' success! ')
# # print (result. Inserted_ids) # # # to return to insert the data of id list
If __name__=="__main__ ':
Headers={
'the user-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64. X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36 ',
'X - Requested - With' : 'the XMLHttpRequest'
}
The main ()
CodePudding user response:
Add a function to CSV
Def save_csv (csv_file data_msg) :
# save
F=open (csv_file, 'a', encoding="utf-8")
F.w rite (" {} \ n ". The format (data_msg))
F. lose ()
Change in your get_pages (), under the for card in CARDS: the content inside the
Deposit the data to a list, the for loop body the last sentence, convert list as a string, call save_csv write CSV file,
CodePudding user response:
Understand that this means can't write