Home > other >  Python, the inside of the for loop scraping of the page content, first determines the content of the
Python, the inside of the for loop scraping of the page content, first determines the content of the

Time:12-18

I am a beginner, the place with bad code, welcome to the great god, optimization, offer the source code below, if to enlarge (730906731 286), the scope, I found an error, because some web page can't find the content of the above fetching, how can I skip these contains no need the content of web pages, execute the next cycle,
The import requests
The import CSV
The from bs4 import BeautifulSoup
The from the selenium import webdriver

Url_part='http://odds.500.com/fenxi/yazhi'
Headers={the user-agent: "Mozilla/5.0 (Windows NT 6.3; Win64. X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36}
"

Bt=[" game time ", "round", "home", "score", "visitors", "macau and the plate initial odds", "Macao and offer instant odds"]
With open (" d:/python/football/goods. CSV ", 'w', newline=' ') as t:
Writer.=the CSV writer (t)
Writer. Writerow (bt) # write tags, then the


For I in range (730906731-286) :
Url=url_part + '-' + STR (I) + 'SHTML'
HTML=requests. Get (url, headers=headers). The content. decode (' GBK ', 'ignore')
Soup=BeautifulSoup (HTML, 'LXML')
Bssja=soup. The find (' p ', attrs={' class ':' game_time}). # text get the game time BSSJ
BSSJ=bssja [4:] # get game time BSSJ
BSDW=soup. The.findall (' a ', attrs={' class ':' hd_name}) # get game team
BSZD=BSDW [0]. # text get the home game name
DJL=BSDW [1]. The text. The replace (" space ", ""). The replace (" \ n", "") # get match is round
# BSKD=BSDW [2]. The text get match the visitors name
Bf=soup. The find (' p ', attrs={' class ':' odds_hd_bf}). The text # get two teams score
Am_pkdm=soup. The find (tr, id='5'). The.findall (' tbody ') # from macau dish, the dish code
Am_pkjs=am_pkdm [0]. Text. Replace (" space ", ""). The replace (" \ n", "") # from macau instant odds
Am_pkcs=am_pkdm [1]. The text. The replace (" space ", ""). The replace (" \ n", "") # from macau initial odds
Sj=[BSSJ, DJL BSZD, bf, BSKD, am_pkcs, am_pkjs]
With open (" d:/python/football/goods. CSV ", 'a', newline=' ') as t: # numline is to control the number of rows
Writer.=the CSV writer (t)
Writer. Writerow (sj) # write tag,

CodePudding user response:

Use crawlers using exception handling is indispensable, if captured web content does not exist, throws an exception, you only need to catch this exception, it is ok to skip this cycle, you can baidu the exception handling mechanism of Python

CodePudding user response:

Hello, this how to catch this exception out of the loop