I need help on this script that will automatically scrap the web and save selected variables as a result. This is the result that I would like to have.
Collection | Homesites | Bedrooms | Price Range |
---|---|---|---|
Mosaic | 292 | 2 -3 | $557,990 - $ 676,990 |
Legends | 267 | 2 - 3 | $673,990 - $788,990 |
Estates | 170 | 2 - 3 | $863,990 - $888,990 |
This is the code that I already have. I was able to save 'collections' in the first column but I am not being able to save the numbers into the rest of the columns (not in the right place). I need help with write the result to the csv file in the correct formatting and the right place which is underneath the headers. Thank you!
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.lennar.com/new-homes/california/sacramento/el-dorado-hills/heritage-el-dorado-hills'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, 'html.parser')
containers = page_soup.findAll('div', {'class':'GridItem_container__5PgVU GridItem_item-start-1__1YuAr GridItem_item-span-11__2CVRN GridItem_item-start-md-7__2IvNK GridItem_item-span-md-5__xZ-1p GridItem_item-start-lg-13__2IVYX GridItem_item-span-lg-9__1sQmg GridItem_item-row-2__-d6T5 GridItem_item-row-md-1__20GIi'})
numbers = page_soup.findAll('p', {'class':'Typography_headline3__2nuPh'})
filename = 'product.csv'
f = open(filename, 'w')
headers = 'Collection, Homesites, Bedrooms, Price range\n'
f.write(headers)
for collections in containers:
collection = collections.p.text
f.write(collection '\n')
for num in numbers:
num = num.text
f.write(num '\n') # I need help here
f.close()
#containers look like this after BS4
Mosaic
Legends
Estates
# nums look like this after BS4
292
2 - 3
$557,990 - $676,990
267
2 - 3
$673,990 - $788,990
170
2 - 3
$863,990 - $888,990
CodePudding user response:
This could be a solution for you:
names = [c.p.text for c in containers]
values = [n.text for n in numbers]
df=pd.DataFrame({'Collection':names,
'Homesites':[v for v in values[::3]],
'Bedrooms':[v for v in values[1::3]],
'Price range':[v for v in values[2::3]],
})
>>> df
Collection Homesites Bedrooms Price range
0 Mosaic 292 2 - 3 $557,990 - $676,990
1 Legends 267 2 - 3 $673,990 - $788,990
2 Estates 170 2 - 3 $863,990 - $888,990
df.to_csv('product.csv', sep=";")
First I do a little re-arrangement to have list with the values per column. Then I create a DataFrame and save this as csv as the last step.