Home > Net >  Is there a method/way to save text data into csv columns with the right formatting as requested?
Is there a method/way to save text data into csv columns with the right formatting as requested?

Time:10-28

I need help on this script that will automatically scrap the web and save selected variables as a result. This is the result that I would like to have.

Collection Homesites Bedrooms Price Range
Mosaic 292 2 -3 $557,990 - $ 676,990
Legends 267 2 - 3 $673,990 - $788,990
Estates 170 2 - 3 $863,990 - $888,990

This is the code that I already have. I was able to save 'collections' in the first column but I am not being able to save the numbers into the rest of the columns (not in the right place). I need help with write the result to the csv file in the correct formatting and the right place which is underneath the headers. Thank you!

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.lennar.com/new-homes/california/sacramento/el-dorado-hills/heritage-el-dorado-hills'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, 'html.parser')
containers = page_soup.findAll('div', {'class':'GridItem_container__5PgVU GridItem_item-start-1__1YuAr GridItem_item-span-11__2CVRN GridItem_item-start-md-7__2IvNK GridItem_item-span-md-5__xZ-1p GridItem_item-start-lg-13__2IVYX GridItem_item-span-lg-9__1sQmg GridItem_item-row-2__-d6T5 GridItem_item-row-md-1__20GIi'})
numbers = page_soup.findAll('p', {'class':'Typography_headline3__2nuPh'})

filename = 'product.csv'
f = open(filename, 'w')
headers = 'Collection, Homesites, Bedrooms, Price range\n' 
f.write(headers)

for collections in containers:
    collection = collections.p.text
    f.write(collection   '\n')
    
for num in numbers:
    num = num.text
    f.write(num   '\n') # I need help here
    
f.close()
#containers look like this after BS4
Mosaic
Legends
Estates

# nums look like this after BS4
292
2 - 3
$557,990 - $676,990
267
2 - 3
$673,990 - $788,990
170
2 - 3
$863,990 - $888,990

CodePudding user response:

This could be a solution for you:

names = [c.p.text for c in containers]
values = [n.text for n in numbers]

df=pd.DataFrame({'Collection':names, 
                 'Homesites':[v for v in values[::3]],
                 'Bedrooms':[v for v in values[1::3]],
                 'Price range':[v for v in values[2::3]],
                })
>>>  df
  Collection Homesites Bedrooms          Price range
0     Mosaic       292    2 - 3  $557,990 - $676,990
1    Legends       267    2 - 3  $673,990 - $788,990
2    Estates       170    2 - 3  $863,990 - $888,990

df.to_csv('product.csv', sep=";")

First I do a little re-arrangement to have list with the values per column. Then I create a DataFrame and save this as csv as the last step.

  • Related