I developed a website using Django where the HTML content is scraped data from amazon. The page's function is to scrape the data from amazon when I give a search item. I used Beautiful Soup to scrape data. When I ran the function alone without running the server, the output is fine and there is no issue. But when I used that same function in my server, sometimes I get output which is a table of scraped data. But sometimes I don't get any table in my page. I feel like the issue is from the way of adding Django in my code. As I'm new to Django, please check whether I've entered all the code correctly. The code I used is,
views.py
def amzlogic(response):
USER_AGENT = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36"
LANGUAGE = "en-US,en;q=0.5"
session = requests.Session()
session.headers['User-Agent'] = USER_AGENT
session.headers['Accept-Language'] = LANGUAGE
session.headers['Content-Language'] = LANGUAGE
title_list = []
price_list = []
image_url_list = []
if response.method == "GET":
search = response.GET.get("search-item")
search = search.replace(" ", " ")
url = f"https://www.amazon.in/s?k={search}&page=1&qid=1636019714&ref=sr_pg_1"
page = requests.get(url)
soup = BeautifulSoup(page.content,'lxml')
for item in soup.select(".s-border-top"):
title = item.select_one(".a-color-base.a-text-normal").get_text()[:25]
try:
price = item.select_one(".a-price-whole").get_text().replace(",", "").replace(".", "")
except:
price = "No Price"
image_url = item.select_one(".s-image")
title_list.append(title)
price_list.append(price)
image_url_list.append(image_url.get('src'))
return render(response, "main/amazonscrape.html", {"title_list":title_list, "price_list":price_list, "image_list":image_url_list})
templates.html
{% block content %}
<form method="GET" action="#"> {%csrf_token%}
<label for="search-query">Search:</label> <br>
<input type="text" name = "search-item" placeholder="Enter your search item"> <br>
<!-- <label for="search-query">Number of pages:</label><br>
<input type="number" name = "page-limit" placeholder="No. of pages"><br> -->
<input type="submit" name="search" value="search">
</form>
<table>
<tr>
<td>
<table>
<tbody>
{%for title in title_list%}
<tr>
<td>{{title}}</td>
</tr>
{%endfor%}
</tbody>
</table>
</td>
<td>
<table>
<tbody>
{%for price in price_list%}
<tr>
<td>{{price}}</td>
</tr>
{%endfor%}
</tbody>
</table>
</td>
<td>
<table>
<tbody>
{%for image in image_list%}
<tr>
<td>{{image}}</td>
</tr>
{%endfor%}
</tbody>
</table>
</td>
</tr>
</table>
{%endblock%}
If the error is from someother file, please mention that in the comment. I would also add that code.
CodePudding user response:
Well, response
should be request
, but beyond that, the code looks OK. (Not perfect – I'd use a single list of dicts for the items – but OK.) (Also, you don't use the Requests session
that you create for the actual request.)
I'd probably refactor the search into another function, á la
import requests
HEADERS = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.157 Safari/537.36",
"Accept-Language": "en-US,en;q=0.5",
"Content-Language": "en-US,en;q=0.5",
}
def do_amazon_search(search):
search = search.replace(" ", " ")
response = requests.get(f"https://www.amazon.in/s?k={search}&page=1&qid=1636019714&ref=sr_pg_1", headers=HEADERS)
response.raise_for_status()
soup = BeautifulSoup(response.content, "lxml")
items = []
for item in soup.select(".s-border-top"):
title = item.select_one(".a-color-base.a-text-normal").get_text()[:25]
try:
price = item.select_one(".a-price-whole").get_text().replace(",", "").replace(".", "")
except Exception:
price = "No Price"
image_url = item.select_one(".s-image")
items.append({"title": title, "price": price, "image": image_url.get("src")})
return items
def amzlogic(request):
search = request.GET.get("search-item")
items = do_amazon_search(search)
return render(request, "main/amazonscrape.html", {"items": items})
But when I used that same function in my server, sometimes I get output which is a table of scraped data. But sometimes I don't get any table in my page.
I'd just guess sometimes there aren't any tags in the scrape output that would match for item in soup.select(".s-border-top"):
, and so you end up with no items.