Home > other >  Return value without overwriting
Return value without overwriting

Time:01-17

Code below is only outputting the last return and would only show the results of the last webpage in the data frame. I have 18 pages in total to scrape and I hope I can insert price data from each page into the same data frame without overwriting each other.

def get_basic_info(content_list):
    basic_info = []
    for item in content_list:
        basic_info.append(item.find_all('h5'))
    return basic_info

def get_prices(basic_info):
    prices = []
    for item in basic_info:
        for i in item:
            price = i.find("span", attrs = {"class" : "price"})
        if price:
            prices.append(price.text)
          
    #print(prices)
    return(prices)

for page in range(1,18):
    base_url = "https://www.easycar.tw/carList.php?Action=search&show=col&lifting=desc&year=&year1=&page="  str(page)
    response = get(base_url, headers=headers)
    html_soup = BeautifulSoup(response.text, 'html.parser')
    content_list = html_soup.find_all('div', attrs={'class': 'caption'})
    basic_info = get_basic_info(content_list)
    prices = get_prices(basic_info)

cols = ["Price"]
data = pd.DataFrame({"Price" : prices})[cols]
data

CodePudding user response:

You are defining prices in every iteration and writing the result to it. What you would want to do is basically before you enter the loop define prices = []

Then go to the for loop.

then do prices.append(get_prices(basic_info))

and when making your dataframe, you can do:

data = pd.DataFrame({"Price" : prices})[cols] Just as you are doing now, because then you are actually making a list.

  • Related