Home > other >  Python crawler small white for help
Python crawler small white for help

Time:11-24

At present himself in trying to write weibo crawler, found want data using the method of the request, but does not export to CSV or excel, strives for the great god answers,

See code:

# - * - coding: utf-8 - * -
"" "
Created on Wed Jul 29 15:41:14 2020

@ the author: yzhang
"" "


Import requests, time, the random, json, re
The from urllib. Parse the import urlencode
# import pymongo
The from requests. Exceptions import RequestException
The import CSV
The import codecs
The import XLWT


Def get_pages (since_id) :
Data={
https://bbs.csdn.net/topics/
'type' : 'the uid,
'value', '1702771281',
'containerid', '1076031702771281',
'the since_id: since_id
}

Base_url='https://m.weibo.cn/api/container/getIndex? '
Url=base_url + urlencode (data)
Result=requests. Get (url, headers=headers)
Try:
If the result. Status_code==200:
The response=requests. Get (url)
Res_dict=json. Loads (response. The text)
CARDS=res_dict [' data '] [' CARDS ']

For card in CARDS:
Text=card [' mblog] [' raw_text]
Like=card [' mblog] [' attitudes_count]
Comment=card [' mblog] [' comments_count]
Repost=card [' mblog] [' reposts_count]

Print (text)
Print (comment)
Print (repost)
Print (like)
Print (' - '* 50)
The write
# print (result. Json ()) # # # page of the return type is, in fact, the STR type, but it is very special, is the json format so if you want to directly
# # # parse returns as a result, get a dictionary format, you can directly call json ()

# CREAT TABLE weibo_test (id int primary key auto_increment, weibo_text text) DEFAULT CHARSET='utf8'
Except requests. ConnectionError as e:
Print (" Error ", e.a RGS)



Min_since_id='
Def get_since_id () :

Global min_since_id
Topic_url='https://m.weibo.cn/api/container/getIndex? Type=uid& value=https://bbs.csdn.net/topics/1702771281&containerid=1076031702771281 '
Topic_url=topic_url + '& amp; The since_id='+ STR (min_since_id)
# # print (json)
Result=requests. Get (topic_url, headers=headers)
Json=result. Json ()
# print (json)
The items=json. Get (' data '). The get (' cardlistInfo)
# print (items)
Min_since_id=items [' since_id]
Return min_since_id






Def the main () :

For I in range (10) :
Print (' {} on page. The format (I))
Print (get_since_id ())
Get_pages (get_since_id ())




# def save_to_mongodb (dict) :
#
# client=pymongo. MongoClient ()
# db=client/' weibo '
# collection=db/' weibo '
# # if collection. Insert_one (dict) : return ID value
Write data # print (' success! ')
# # print (result. Inserted_ids) # # # to return to insert the data of id list




If __name__=="__main__ ':

Headers={
'the user-agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64. X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36 ',
'X - Requested - With' : 'the XMLHttpRequest'
}
The main ()



CodePudding user response:

Add a function to CSV

 

Def save_csv (csv_file data_msg) :
# save
F=open (csv_file, 'a', encoding="utf-8")
F.w rite (" {} \ n ". The format (data_msg))
F. lose ()



Change in your get_pages (), under the for card in CARDS: the content inside the

Deposit the data to a list, the for loop body the last sentence, convert list as a string, call save_csv write CSV file,



CodePudding user response:

Understand that this means can't write
  • Related