Try:
# the current page is less than the total page number cycle continue to
While the int (page_active) & lt; Page_num + 1:
The items=get_data ()
Save_data (items)
If page_active==page_num:
Break
Next_page ()
Page_active=WAIT until (EC. Presence_of_element_located (
(By CSS_SELECTOR, '# video - the list & gt; Div. Page - wrap & gt; Div & gt; Ul & gt; Li. Page - item. The active '))). The text
# # instead for statement
# for num in range (page_num) :
# items=get_data ()
# save_data (items)
# next_page ()
Finally:
The quit ()
The book. The save (' video information list. XLSX)
A custom function
Def get_data () :
"" "
Get the name of the video in the web page, video address, description, viewed, number of barrage, release time
: return: the name of the video, video address, description, viewed, number of barrage, release time
"" "
Print (' began to get the data ')
# for the content of the video page. Page_source familiar with
HTML=the page_source
# regular expression must use r 'to set expression string
The pattern=re.com running (
R '& lt; Li. *? . *?" Icon - playtime & gt;" (. *?) . *?" Icon - the subtitle "& gt; (. *?) . *?" Icon - date & gt;" (. *?) . *? '
Re. S)
The items=re. The.findall (pattern, HTML)
For the item in the items:
Yield [
The item [0],
The item [1],
The item [2],
The item [3],
The item [4],
The item [5]
]
Def save_data (items) :
"" "
Save the video information
: param items: video information generator
: return:
"" "
Print (' began to save the data ')
For the item in the items:
# into global variables, save the data down
Global n
For j in range (len (item) :
Sheet. Write (n, j, item [j])
Print (n, j)
N=n + 1
CodePudding user response:
Len (item)CodePudding user response: