import requests
import bs4
import csv
from itertools import zip_longest
laptop = []
laptops_price = []
links = []
url = "https://www.jumia.com.eg/ar/catalog/?q=لابتوب"
page = requests.get("https://www.jumia.com.eg/ar/catalog/?q=لابتوب")
bs = bs4.BeautifulSoup(page.content, 'html.parser')
laptops = bs.find_all('h3')
laptops_prices = bs.find_all("div", {"class": "prc"})
for l in range(len(laptops)):
laptop.append(laptops[l].text)
links.append(laptops[l].find("a", {"class" : "core"}).attrs['href'])
laptops_price.append(laptops_prices[l].text)
laptops_list = [laptop, laptops_price, links]
exported = zip_longest(*laptops_list)
with open(r"C:\Users\Administrator\Desktop\jumiawep.csv", "w", encoding="utf-8") as jumialaptops:
write = csv.writer(jumialaptops)
write.writerow(["Laptop", "Price", "Links"])
write.writerows(exported)
raceback (most recent call last):
File "C:\Users\Administrator\PycharmProjects\pythonProject\main.py", line 17, in <module>
links.append(laptops[l].find("a").attrs['href'])
AttributeError: 'NoneType' object has no attribute 'attrs'
I tried to get a list
of links problem when I was scraping but i get this error.
CodePudding user response:
There are different things in my opinion:
- website is protected by cloudflare, I am not able to request it from my location
Cloudflare is a global network designed to make everything you connect to the Internet secure, private, fast, and reliable. Secure your websites, APIs, and Internet applications. Protect corporate networks, employees, and devices. Write and deploy code that runs on the network edge.
<h3>
do not have a child<a>
that you try tofind()
, instead<h3>
is a child of<a>
avoid the bunch of lists and process your scraping in one go.
Example
If you are not blocked by cloudflare and content is not rendered dynamically by javascript
this should give you the expected result.
import requests, csv
from bs4 import BeautifulSoup
url = "https://www.jumia.com.eg/ar/catalog/?q=لابتوب"
soup = BeautifulSoup(requests.get(url).content)
with open(r"jumiawep.csv", "w", encoding="utf-8") as jumialaptops:
write = csv.writer(jumialaptops)
write.writerow(["Laptop", "Price", "Links"])
for e in soup.select('article'):
write.writerow([
e.h3.text,
e.select_one('.prc').text,
e.a.get('href')
])