Code below is only outputting the last return and would only show the results of the last webpage in the data frame. I have 18 pages in total to scrape and I hope I can insert price data from each page into the same data frame without overwriting each other.
def get_basic_info(content_list):
basic_info = []
for item in content_list:
basic_info.append(item.find_all('h5'))
return basic_info
def get_prices(basic_info):
prices = []
for item in basic_info:
for i in item:
price = i.find("span", attrs = {"class" : "price"})
if price:
prices.append(price.text)
#print(prices)
return(prices)
for page in range(1,18):
base_url = "https://www.easycar.tw/carList.php?Action=search&show=col&lifting=desc&year=&year1=&page=" str(page)
response = get(base_url, headers=headers)
html_soup = BeautifulSoup(response.text, 'html.parser')
content_list = html_soup.find_all('div', attrs={'class': 'caption'})
basic_info = get_basic_info(content_list)
prices = get_prices(basic_info)
cols = ["Price"]
data = pd.DataFrame({"Price" : prices})[cols]
data
CodePudding user response:
You are defining prices in every iteration and writing the result to it. What you would want to do is basically before you enter the loop define prices = []
Then go to the for
loop.
then do prices.append(get_prices(basic_info))
and when making your dataframe, you can do:
data = pd.DataFrame({"Price" : prices})[cols]
Just as you are doing now, because then you are actually making a list.