Home > other >  Re less than data extraction, strives for the great god answers
Re less than data extraction, strives for the great god answers

Time:03-23

To extract the key code:

1



74929 & lt;/th>


,
2 & lt;/th>



62567 & lt;/th>


,
. Behind the same type

My code:
The import requests
The from tool import useragenttool
The import bs4
The import re
The import openpyxl

Def open_url (url) :
"" "parse the url, source information "" "
Res=requests. Get (url, headers=useragenttool get_headers ())
Return res

Def find_data (res) :
Datas=[]
Soup=bs4. BeautifulSoup (res) text, ". The HTML parser ")
The content=soup. The find (class_="gb - dataListBox")
# print (content)
Target=content. find_all (" tr "style=" cursor: pointer;" )
# print (target)
Target=iter (target)

For each target in:
# print (each text)
If each. Text. Isnumeric () :
Datas. Append ([
Re search (r '(. +), the next (target). The text), group (1),
're search (r \ d. *', next (target). The text), group (),
're search (r \ d. *', next (target). The text), group (),
're search (r \ d. *', next (target). The text), group ()])
Print (datas)

Return datas


Def the main () :
Url="https://www.creprice.cn/rank/cityforsale.html"
Res=open_url (url)
Datas=find_data (res)


If __name__=="__main__ ':
The main ()

Why print (datas) list out the datas of empty ah, I'm going to climb the city, housing prices have two percentage, behind new mystery, a great god answer

CodePudding user response:

 import requests 
The import bs4
The import re


Def find_data () :
The head={
'the user-agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36 ',
'Connection' : 'keep alive -'}
Res=requests. Get (' https://www.creprice.cn/rank/cityforsale.html 'headers=head)
The content=bs4. BeautifulSoup (res) text, ". The HTML parser "). The find (class_="gb - dataListBox")
Target=content. find_all (" tr "style=" cursor: pointer;" )
Info_list=[]
For each target in:
Tmp_dic=dict ()
City=re search (' [^ \ x00 - \ XFF] + ', each. The text), group ()
Price=re search (\ d +, \ d +, each. The text), group ()
Rate=re. The.findall (' [+ -] \ d +. * % ', each. The text)
Tmp_dic [city]=[price, rate [1], what [0]]
Info_list. Append (tmp_dic)
Print (info_list)

If __name__=="__main__ ':
Find_data ()

[{'深圳': ['74,929', '+18.96%', '+2.86%']}, {'北京': ['62,567', '-2.09%', '-4.76%']}, {'上海': ['54,911', '+5.85%', '-0.25%']}, {'厦门': ['47,817', '+5.66%', '+0.27%']}, {'三亚': ['38,291', '+12.01%', '+3.72%']}, {'广州': ['35,934', '+6.13%', '+5.43%']}, {'杭州': ['31,487', '+4.1%', '+3.1%']}, {'南京': ['31,416', '+2.87%', '-0.24%']}, {'福州': ['26,288', '+0.55%', '+1.78%']}, {'天津': ['25,751', '+0.14%', '+1.4%']}, {'宁波': ['23,544', '+15.65%', '+0.5%']}, {'珠海': ['23,473', '+1.43%', '-0.37%']}, {'苏州': ['23,294', '+6.32%', '-1.96%']}, {'青岛': ['21,890', '+1.65%', '+0.76%']}, {'温州': ['21,777', '+7.11%', '-1.31%']}, {'丽水': ['19,428', '+7.9%', '-2.74%']}, {'武汉': ['18,942', '+4.89%', '+0.3%']}, {'东莞': ['17,921', '+11.79%', '+0.86%']}, {'金华': ['17,279', '+5.54%', '-0.69%']}, {'成都': ['16,726', '+7.34%', '+3.11%']}, {'无锡': ['16,675', '+12.46%', '+0.13%']}, {'合肥': ['16,500', '+4.93%', '-0.73%']},...., {'鹤岗': ['2,307', '-2.19%', '-2.92%']}]

CodePudding user response:

Thank you big!!!!!

CodePudding user response:

posts can settle?
  • Related