Home > other >  The crawler error: get_houses_by_sub_district () takes 1 positional argument but were 3 g
The crawler error: get_houses_by_sub_district () takes 1 positional argument but were 3 g

Time:10-01

The import re
The from LXML import etree
The import requests
The import pymongo
The import math

Headers={
"The user-agent: Mozilla/5.0 (Windows NT 10.0; Win64. X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36}
"

Def get_districts () :
Url="https://sh.lianjia.com/ershoufang"
R=requests. Get (url, headers=headers)
The content=r.c ontent. Decode (" utf-8 ")
Root=etree. HTML (content)
Div_nodes=root. Xpath ('//div [@ data - role="ershoufang"] ')
Div_node=div_nodes [0]
A_nodes=div_node. Xpath ('/div/a ')
Result=[]
For a_node a_nodes in:
District_name=a_node. Text
District_url="https://sh.lianjia.com" + a_node attrib/" href "
Result. Append ([district_name district_url])
Print (district_name)
Return the result


Def get_sub_districts () :
Districts=get_districts ()
The client=pymongo. MongoClient ()
The db=client/", "house"
For district in districts:
District_name=district [0]
District_url=district [1]
R=requests. Get (district_url, headers=headers)
The content=r.c ontent. Decode (" utf-8 ")
Root=etree. HTML (content)
A_nodes=root. Xpath ('//div [@ data - role="ershoufang"]/div [2]/a ')
For a_node a_nodes in:
Sub_district_name=a_node. Text
Sub_district_url="https://sh.lianjia.com" + a_node attrib/" href "
Db. Subdistricts. Insert ({" district_name ": district_name," sub_district_name ": sub_district_name,
"Sub_district_url" : sub_district_url})

R=requests. Get (sub_district_url, headers=headers)
The content=r.c ontent. Decode (" utf-8 ")
Root=etree. HTML (content)
Span_node=root. Xpath ('//h2 [contains (@ class, "total")]/span ') [0]
Num=int (span_node. Text)
Return num

Def get_page_num (sub_district_url) :
R=requests. Get (sub_district_url, headers=headers)
The content=r.c ontent. Decode (" utf-8 ")
Root=etree. HTML (content)
Span_node=root. Xpath ('//h2 [contains (@ class, "total")]/span ') [0]
Num=int (span_node. Text)
Return num


Def get_houses_by_sub_district (sub_district_url) :
House_num=get_page_num (sub_district_url)
Page_num=math.h ceil (house_num/30)
The client=pymongo. MongoClient ()
The db=client/", "house"
For I in range (1, page_num + 1, 1) :
Url_patt=sub_district_url + "pg {}
"Url=url_patt. The format (I)
R=requests. Get (url, headers=headers)
The content=r.c ontent. Decode (" utf-8 ")
Root=etree. HTML (content)
Li_nodes=root. Xpath ('//ul/@/li ')
For li_node li_nodes in:
Title=li_node. Xpath ('.//div/@/a ') [0]. The text
Info_nodes=li_node. Xpath ('.//div/@/div/@/span ')
Xiaoqu_nodes=li_node. Xpath ('.//div/@/div/@/a ')
Price_nodes=li_node. Xpath ('.//div/@/div/@/span ')
Up_nodes=li_node. Xpath ('.//div/@/div/@/span ')
If len (price_nodes) & gt; 0:
Price=float (price_nodes [0]. Text)

If len (up_nodes) & gt; 0:
Up_text=up_nodes [0]. Text
Matched=re search (r 'unit price (. *) yuan/square meters, up_text)
If matched:
Up_price=float (matched group (1))
If len (xiaoqu_nodes) & gt; 0:
Xiaoqu_node=xiaoqu_nodes [0]
Xiaoqu_name=xiaoqu_node. Text
If len (info_nodes) & gt; 0:
Info_text=info_nodes [0]. Tail
With parts=info_text. Split (" | ")
Size_text=parts [1]
Buildyear_text=parts [5]
Matched=re search (r '(/\ \ d +) square meters', size_text)
If matched:
Size=float (matched group (1))
Matched=re search (r '(/\ \ d +) in a year', buildyear_text)
If matched:
Buildyear=int (matched group (1))
Huxing=parts [0]
Chaoxiang=parts [2]
Zhuangxiu=parts [3]
Cenggao=parts [4]
Louxing=parts [6]
House={
"Title" : the title,
"Price" : price,
"Up_price" : up_price,
"Xiaoqu_name" : xiaoqu_name,
"Size" : the size,
"Buildyear" : buildyear,
"Huxing" : huxing,
"Chaoxiang" : chaoxiang,
"Zhuangxiu" : zhuangxiu,
"Cenggao" : cenggao,
"Louxing" : louxing,
"District_name" : district_name,
"Sub_district_name" : sub_district_name,
}
Db. House. Insert (house)

Def get_all_house () :
The client=pymongo. MongoClient ()
The db=client/", "house"
Cursor=db. Subdistricts. The find ()
For the item in cursor:
District_name=item [" district_name "]
Sub_district_name=item [" sub_district_name "]
Sub_district_url=item [" sub_district_name "]
Print (district_name sub_district_name, sub_district_url)
Get_houses_by_sub_district (district_name sub_district_name, sub_district_url)



If __name__=="__main__" :
Get_all_house ()

An error
Traceback (the most recent call last) :
File "C:/Users/msi/PycharmProjects untitled1/clean. Py", line 137, in & lt; module>
Get_all_house ()
The File "C:/Users/msi/PycharmProjects untitled1/clean. Py", line 132, in get_all_house
nullnullnullnullnullnullnullnullnullnullnullnullnullnull
  • Related