Python BeautifulSoup do crawler for Daniel advice!!!!! Crawl <span> and <I> between cont-CodePudding

Of new is learning to do with Python crawler, but encountered a difficult problem in online do not search the solution, for Daniel to give directions!

Want to crawl in HOME LINK network: lease way, the information such as housing types, but HOME LINK put this information in the span and li tags, result in itself is not a separate label, I don't know how to extract,

Now write the code as follows, and teach me ~ thank you! Just want to know if there is any way other than with sliced

CodePudding user response:

 
The from bs4 import BeautifulSoup 

Page_source="" "
  Housing types & lt;/span> 2 room 1 hall xxxx
" "
"
Soup=BeautifulSoup (page_source, '. The HTML parser) 
The item=soup. Find_all (' li ') 
Print (item [0]. Contents [1])

CodePudding user response:

 #! The/usr/bin/env python 
# - * - coding: utf-8 - * - 

The from bs4 import BeautifulSoup 
The import requests 

Url="https://m.lianjia.com/chuzu/bj/zufang/" 
Headers={
"The user-agent: Mozilla/5.0 (Linux; The Android 6.0. The Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Mobile Safari/537.36 "
} 

The response=requests. Get (url=url, headers=headers) 
HTML=response. Content. decode (" utf8 ") 
Soup=BeautifulSoup (HTML, "HTML parser") 
For the tag in soup. Find_all (' div 'class_=' content__item ') : 
Style=STR (tag. Find_all (class_='p', 'content__item__content')). \ 
The replace (' [& lt; p & gt; ', ' ') \ 
Replace (" ", ""). The replace (" \ n", ""). The replace (" & lt;/p> ] ", "") 
Price=STR (tag. Find_all (class_='p', 'content__item__bottom')) [r]. 68-120 the replace (" ", "") 
Page=tag. The find (" img "). The get () 'Alt' 
Print (" {}, {}, {}/yuan each month ". The format (page, style, price))

CodePudding user response:

Is the code above, the following is running as a result, the hope can help you

CodePudding user response:

reference 1st floor without a white reply:

 
The from bs4 import BeautifulSoup 

Page_source="" "
  Housing types & lt;/span> 2 room 1 hall xxxx
" "
"
Soup=BeautifulSoup (page_source, '. The HTML parser) 
The item=soup. Find_all (' li ') 
Print (item [0]. Contents [1])

Is roughly in accordance with the way you told me I tried once, way of thinking is the same, I didn't think a success