Home > other >  Python is a small white for help!!!!!!!!!!!!!!
Python is a small white for help!!!!!!!!!!!!!!

Time:09-15

Bosses see come over, this is I want to climb the eastern wealth web crawler code on pucharm run (is) on the Internet looking for but not always
The import requests
The import re
The from multiprocessing import Pool
The import json
The import CSV
The import pandas as pd
The import OS
The import time

# set file stored in D disk eastmoney folder
File_path='D: \ \ eastmoney'
If not OS. Path. The exists (file_path) :
OS. The mkdir (file_path)
OS. The chdir (file_path)



# 1 set table crawl time
Def set_table () :
Print (' * '* 80)
Print (' t \ \ t \ \ t t Oriental wealth network report download ')
Print (' author: senior 2018.10.10 migrant workers')
Print (" -- -- -- -- -- -- -- -- -- -- -- -- -- - ')

# 1 set of financial statements to obtain time
Year=int (float (input (' please enter to query the year (4 digits), 2007-2018: \ n ")))
Integer # int, and float inside because of input STR, directly int complains, float is not
# https://stackoverflow.com/questions/1841565/valueerror-invalid-literal-for-int-with-base-10
While (year & lt; 2007/year & gt; 2018) :
Year=int (float (input (' year numerical input error, please enter again: \ n ')))

Quarter=int (float (input (" please input lowercase digital quarter (1:1 quarterly, 2 - NianZhongBao, quarterly reports, 4 - annual report) : \ n ")))
While (quarter & lt; 1 the or quarter & gt; 4) :
Quarter=int (float (input (' quarter numerical input error, please enter again: \ n ")))

# into the quarter two methods, 2 said double-digit, dissatisfied two with 0, 0
# http://www.runoob.com/python/att-string-format.html
='d} {: 02' quarter. The format (quarter * 3)
02 # quarter='% d' % (int * 3) (month)

# sure quarter is the last day of the corresponding 30 or 31
If (quarter=='6') or (quarter=='9') :
Day=30
The else:
Day=31
The date='{} - {} - {}'. The format (year, quarter, day
# print (' date: the date) # test date ok

Set # 2 types of financial statements
Tables=int (
Please input (' input query statements types corresponding number (1 - performance statement; 2 - performance fast report: 3 - performance forecast table; 4 - disclosure schedule an appointment; 5 - the balance sheet; 6 - the income statement; 7 - the cash flow statement) : \ n '))

Dict_tables={1: 'performance reports, 2:' fast performance reports, 3: 'earnings forecast table,
4: 'disclosure schedule an appointment, 5:' balance sheet ', 6: 'income statement', 7: 'cash flow statement'}

Dict={1: 'YJBB, 2:' YJKB, 3: 'YJYG,
4: 'YYPL, 5:' ZCFZB, 6: 'LRB, 7:' XJLLB}
Category=dict (tables)

# js request parameters in type, 1-4 table prefix is' YJBB20_, after three tables is' CWBB_ '
# set set_table () in the type, st, sr, the filter parameter
If tables==1:
Category_type='YJBB20_'
St='latestnoticedate'
The sr=1
The filter="(securitytypecode in (' 058001001 ', '058001002')) (reportdate=^ ^ % s)" % (date)
Elif tables==2:
Category_type='YJBB20_'
St='ldate'
The sr=1
The filter="(securitytypecode in (' 058001001 ', '058001002')) (rdate=^ ^ % s)" % (date)
Elif tables==3:
Category_type='YJBB20_'
St='ndate'
The sr=1
The filter="(IsLatest='T') (enddate=^ ^ 2018-06-30),"
Elif tables==4:
Category_type='YJBB20_'
St='frdate'
The sr=1
The filter="(securitytypecode='058001001') (reportdate=^ ^ % s)" % (date)
The else:
Category_type='CWBB_'
St='noticedate'
The sr=1
The filter='(reportdate=^ ^ % s)' % (date)

Category_type=category_type + category
# print (category_type)
# set set_table filter parameters of the ()

Yield {
'date: date,
'category' : dict_tables [name],
'category_type: category_type,
'st' : st,
'the sr: sr,
'filter: filter
}

# 2 set table crawl starting page number
Def page_choose (page_all) :

# to choose crawl pages
Start_page=int (input (' please enter the download start pages: \ n '))
Nums=input (" please enter the number of pages to download, if need to download all the press enter) : \ n ')
Print (' * '* 80)

# judgment input value or enter Spaces
If nums. Isdigit () :
End_page=start_page + int (nums)
Elif nums==":
End_page=int (page_all group (1))
The else:
Print (' pages input errors')

# returned to the starting page number for the follow-up procedure call
Yield {
'start_page: start_page,
'end_page: end_page
}

# 3 form formal crawl
Def get_table (date, category_type, st, sr, filter, page) :
# parameter Settings
Params={
# 'type' : 'CWBB_LRB',
'type' : category_type, # form type
F12f2f4f091e459a279469fe49eca5 'token' : '70',
'st' : st,
'the sr: sr,
'p' : page,
'ps' : 50 # each page shows how many pieces of information
'js' :' var LFtlXDqn={pages: (tp), data: (x)} ',
'filter: filter,
# 'rt: 51294261 can not
}
Url="http://dcfm.eastmoney.com/em_mutisvcexpandinterface/api/js/get? '

# print (url)
The response=requests. Get (url, params=params). The text
# print (response)
# sure pages
Pat=re.com running (' var. *? {pages: (\ d +), data:. *? ')
Page_all=re search (pat, response)
Print (page_all. Group (1)) # ok

Error # {}, json. Loads
# the pattern=re.com running (' var. *? Data: \ [(. *)]} ', re. S)

# to extract the list, you can use json. Dumps and json. Loads
The pattern=re.com running (' var. *? Data: (. *)} 're. S)
The items=re search (the pattern, the response)
# is equivalent to
# items=re. The.findall (the pattern, the response)
# print (items [0])
Data=https://bbs.csdn.net/topics/items.group (1)
Data=https://bbs.csdn.net/topics/json.loads (data)
# data=https://bbs.csdn.net/topics/json.dumps (data, ensure_ascii=False)

Return page_all, data, page

# write header
Method # 1 with the aid of CSV package, the most commonly used
Def write_header (data, category) :
. With the open (' {}. CSV format (category), 'a', encoding='utf_8_sig', newline=' ') as f:
Headers=list (data [0]. Keys ())
# # print (headers) test ok
nullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnullnull
  • Related