Home > other >  Core curriculum [150: master Python web crawler] learning problems (1)
Core curriculum [150: master Python web crawler] learning problems (1)

Time:10-19

Problem: Xpah statements to add "[0]" understanding for yao, this analysis method is good, there are other methods? (KouXie)
Direct look at the code
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
The import requests
The from LXML import etree
Headers={
'the user-agent' : 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.132 Safari/537.36 ',
'cookies' :' __guid=253446671.3408581543116182000.1572157233008.2388; Uuid=597 c579e cc50-460 - e - ab03-9 ee0056f7643; Ganji_uuid=3790440803065380517902; Lg=1; Track_id=8134445303664640; IsTouFangGuaziIndex=1; guazitrackersessioncadata=https://bbs.csdn.net/topics/%7B%22ca_kw%22%3A%22%25e7%2593%259c%25e5%25ad%2590%22%7D; Sessionid=4 c12ebe0 - c0f5-4610-9 b9e - 32 d8bc873815; Antipas=4 bu608h056218i78ec03458u5; CityDomain=sh; ClueSourceCode=% 2 a % 2300; User_city_id=13; cainfo=%7B%22ca_a%22%3A%22-%22%2C%22ca_b%22%3A%22-%22%2C%22ca_s%22%3A%22sem_360ss%22%2C%22ca_n%22%3A%22360pc_shouye%22%2C%22ca_medium%22%3A%22-%22%2C%22ca_term%22%3A%22%7Bkeyword%7D%22%2C%22ca_content%22%3A%22%22%2C%22ca_campaign%22%3A%22%22%2C%22ca_kw%22%3A%22%25e7%2593%259c%25e5%25ad%2590%22%2C%22ca_i%22%3A%22-%22%2C%22scode%22%3A%2210103213212%22%2C%22keyword%22%3A%22-%22%2C%22ca_keywordid%22%3A%2213434503264%22%2C%22ca_transid%22%3A%22%22%2C%22platform%22%3A%221%22%2C%22version%22%3A1%2C%22track_id%22%3A%228134445303664640%22%2C%22guid%22%3A%22597c579e-cc50-460e-ab03-9ee0056f7643%22%2C%22display_finance_flag%22%3A%22-%22%2C%22client_ab%22%3A%22-%22%2C%22sessionid%22%3A%224c12ebe0-c0f5-4610-9b9e-32d8bc873815%22%2C%22ca_city%22%3A%22sh%22%7D; _gl_tracker=22% % 7 b % 22 ca_source % 3 a % 22-22% % 22% 2 c % 22 ca_name % 3 a % 22-22% % 22% 2 c % 22 ca_kw % 3 a % 22-22% % 22% 2 c % 22 ca_id % 3 a % 22-22% % 22% 2 c % 22 ca_s % 3 a % 22 the self % 22% 2 c % 22 ca_n % 3 a % 22-22% % 22% 2 c % 22 ca_i % 3 a % 22-22% % 22% % 22 sid % 22% 2 c 3 a38614533066%7 d; PreTime=% 7 b % 22 last 3 a1573095554%22% % 2 c % 22 this % 22% 3 a1572157232%2 c % 22 the pre 3 a1572157232%22% % 7 d; Monitor_count=8 '
}

Url='https://www.guazi.com/sh/buy/o1/'
Resp=requests. Get (url, headers=headers)
Text=resp. Content. decode (' utf-8)
# print (text)
HTML=etree. HTML (text)
Ul=HTML. Xpath ('//ul/@ ')
Print (type (ul))
Print (ul)
Print (type (ul [0]))
Print (ul [0])
Print (ul [1])
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
& lt; The class 'list' & gt;
[& lt; Element ul at 0 x38025d0 & gt;]
Traceback (the most recent call last) :
& lt; The class 'LXML. Etree. _Element' & gt;
& lt; Element ul at 0 x38025d0 & gt;
The File "E:/programming learning/Python/PythonTest/combat - crawl melon seeds - debugging. Py", line 18, in & lt; module>
Print (ul [1])
IndexError: list index out of range

The Process finished with exit code 1
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
The above code, code is on the upper, lower output, according to the content of the print, have the following understanding:
1. Ul in the HTML. After xpath's assignment is a list
2. Ul printing according to the results of [0] is ul this list is the first element of the
3. Ul [1] to perform printing, newspaper "list index out of range", explain ul list contains only one element
Before the other, very happy through Posting that type (), and the effect of laparoscope, usually connected to a print () function, here to use,
  • Related