Home > Software engineering >  Python bs4 .find not detecting article
Python bs4 .find not detecting article

Time:12-19

i'm trying to get names of products but when it gets to the sponsored products it returns None. Here's my code;

    next_page_url = 'https://www.jumia.com.ng/catalog/?q=oraimo&shipped_from=country_local&page=1#catalog-listing'
    result_nextpage = requests.get(next_page_url, headers=headers).text # headers are generated from default python 'fake_headers' module.
    doc_nextpage = BeautifulSoup(result_nextpage, 'lxml') # I also tried other parsers
    divs = doc_nextpage.find('div', class_='-paxs row _no-g _4cl-3cm-shs')
    result_articles = divs.select('h3.name')
    for i in result_articles:
        print(i.string)

Result;

Oraimo FreePods-3 2Baba Edition BT 5.2 Wireless Stereo Earbuds
Oraimo 27000mAh Massive Power Charing Bank Traveller 3 Byte
Oraimo OPB-P116DN 10000 Mah Power-Bank Dual Fast Charging
Oraimo FreePods3 True Wireless Stereo Earbuds IPX5 & Sweat Proof
Oraimo Smart Watch 1.69'' IPS Screen IP68 Waterproof
Oraimo FreePods-2 2Baba-version True Wireless Earbuds
Oraimo Silver Edition Smart Watch 1.69'' IPS Screen IP68 Waterproof
Oraimo Charger UKDualUSB OCW-U63D White
Oraimo Portable Wireless Speaker Subwoofer Outdoor Sound Box
Oraimo Charger Oraimo UKDualUSB OCW-U81F White
Oraimo Power Oraimo Bank OPB-P206DN 20KmAh
Oraimo SoundPro Wireless Speaker Muti-Model Music Play
Oraimo Tempo-W3 Smart Watch Health Monitor IP67 Waterproof
Oraimo Car Charger Oraimo OCC-21DML Black
Oraimo SoundPro-2C 10W Portable Wireless Bluetooth Speaker
Oraimo Necklace 5C Neckband Wireless Earphone
Oraimo COMPACT 10000mAh Ultra Slim Fast Charging Power Bank
Oraimo 10000mAh OPTIMIZED SLIM Power-bank With LED Light
Oraimo Mermaid Half In-ear Earphone With Mic
Oraimo Necklace 3 Lite Neckband BT 5.0 Wireless Earphone
Oraimo Senior BT5.0 Single Wireless Bluetooth Headsets
Oraimo True Wireless Bluetooth Earbuds- Freepods 2
Oraimo FreePods-2 2Baba-version True Wireless Earbuds
Oraimo Air-Buds-2S Super Bass Wireless Stereo Earbuds
Oraimo 20000MAH Powerbank -long Lasting PowerBank
Oraimo Bluetooth Wireless SOUNDBAR SPEAKER
Oraimo Shark-2 BT5.0 In-Ear Wireless Bluetooth Headphones
Oraimo BoomPop Over-Ear Bluetooth Wireless Headphone
Oraimo  20000MAH Powerbank -long Lasting Power For Days
Oraimo FreePods-2 2Baba-Version  True Wireless Stereo Earbud
Oraimo 2021 Latest Edition Smart Function Waterproof Smart Watch
Oraimo OCW-U36S Efficient And Durable USB Charger - Black
Oraimo FreePods-2 2Baba-Version  True Wireless Stereo Earbud-white
Oraimo 10000MAh Ultimate Slim Power Bank - Black
Oraimo 20000MAH Powerbank - Power For Days
Oraimo 10000mAh Ultra Slim Fast Charging Power Bank
Oraimo 2020 Edition Tempo S - OSW-11 Multi Function Smart Watch
Oraimo SOLID 27000mAh Massive Powerbank OPB-P271D Traveller 3 Byte
Oraimo FreePods-3 2Baba Edition BT 5.2 Wireless Stereo Earbuds
Oraimo Tempo-S IP67 Waterproof Smart Watch WITH AMAZING FUNCTIONS
None
None
None
None
None
None
None
None

The article tag 41-48 are sponsored products which the names of the product are showing from the inspect element in the browser but bs4 isn't detecting it but it detects other non-sponspored. Please kindly help.

CodePudding user response:

Note First of all, take a look into your soup /doc_nextpage - There is the truth you processing the data on.

What happens?

In your doc_nextpage the html for your sponsored products is empty and thats why you get these None.

They are empty because they will be provided dynamically by website and could not handle this. It is no browser, that will interpret / manipulate data.

How to fix?

One option is to simulate browser behavior with selenium and get page_source to process it with or with itself.

Example (selenium 4)

from bs4 import BeautifulSoup 
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = webdriver.ChromeOptions()
service = ChromeService(executable_path='ENTER YOUR PATH TO CHROMEDRIVER')
driver = webdriver.Chrome(service=service, options=options)
driver.get('https://www.jumia.com.ng/catalog/?q=oraimo&shipped_from=country_local&page=1#catalog-listing')

WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, '[data-list="sponsored"]')))

soup = BeautifulSoup(driver.page_source, 'lxml')

print([x.text for x in soup.select('article h3.name')])

driver.close()

Output

['Oraimo FreePods-3 2Baba Edition BT 5.2 Wireless Stereo Earbuds',
 'Oraimo 27000mAh Massive Power Charing Bank Traveller 3 Byte',
 'Oraimo OPB-P116DN 10000 Mah Power-Bank Dual Fast Charging',
 'Oraimo FreePods3 True Wireless Stereo Earbuds IPX5 & Sweat Proof',
 "Oraimo Smart Watch 1.69'' IPS Screen IP68 Waterproof",
 'Oraimo FreePods-2 2Baba-version True Wireless Earbuds',
 "Oraimo Silver Edition Smart Watch 1.69'' IPS Screen IP68 Waterproof",
 'Oraimo Charger UKDualUSB OCW-U63D White',
 'Oraimo Portable Wireless Speaker Subwoofer Outdoor Sound Box',
 'Oraimo Charger Oraimo UKDualUSB OCW-U81F White',
 'Oraimo Portable Source 10000mAh Po Wer Ba Nk Oraimo OPB-P110D',
 'Oraimo Power Oraimo Bank OPB-P206DN 20KmAh',
 'Oraimo SoundPro Wireless Speaker Muti-Model Music Play',
 'Oraimo Tempo-W3 Smart Watch Health Monitor IP67 Waterproof',
 'Oraimo Car Charger Oraimo OCC-21DML Black',
 'Oraimo SoundPro-2C 10W Portable Wireless Bluetooth Speaker',
 'Oraimo Necklace 5C Neckband Wireless Earphone',
 'Oraimo 10000mAh OPTIMIZED SLIM Power-bank With LED Light',
 'Oraimo COMPACT 10000mAh Ultra Slim Power Fast Charging Bank',
 'Oraimo Mermaid Half In-ear Earphone With Mic',
 'Oraimo Necklace 3 Lite Neckband BT 5.0 Wireless Earphone',
 'Oraimo Senior BT5.0 Single Wireless Bluetooth Headsets',
 'Oraimo True Wireless Bluetooth Earbuds- Freepods 2',
 'Oraimo Pilot 20000mAh 2.1A Fast  Power Charging Bank',
 'Oraimo FreePods-2 2Baba-version True Wireless Earbuds',
 'Oraimo Air-Buds-2S Super Bass Wireless Stereo Earbuds',
 'Oraimo 20000MAH Powerbank -long Lasting PowerBank',
 'Oraimo Bluetooth Wireless SOUNDBAR SPEAKER',
 'Oraimo Shark-2 BT5.0 In-Ear Wireless Bluetooth Headphones',
 'Oraimo BoomPop Over-Ear Bluetooth Wireless Headphone',
 'Oraimo  20000MAH Powerbank -long Lasting Power For Days',
 'Oraimo FreePods-2 2Baba-Version  True Wireless Stereo Earbud',
 'Oraimo OCW-U36S Efficient And Durable USB Charger - Black',
 'Oraimo 2021 Latest Edition Smart Function Waterproof Smart Watch',
 'Oraimo OCW-U36S Efficient And Durable USB Charger - Black',
 'Oraimo Firefly-2 5.0V/2.1A Dual USB Fast Wall Charger',
 'Oraimo FreePods-2 2Baba-Version  True Wireless Stereo Earbud-white',
 'Oraimo 10000MAh Ultimate Slim Power Bank - Black',
 'Oraimo SOLID 27000mAh Massive Powerbank OPB-P271D Traveller 3 Byte',
 'Oraimo FreePods-3 2Baba Edition BT 5.2 Wireless Stereo Earbuds',
 'Oraimo Massive 27000mAh Travellers 3 Byte OPB-P271D Power Bank',
 'Oraimo 1.69" IPS Screen IP68 Waterproof Smart Watch Pro-Silver',
 'Oraimo Tempo-S IP67 Waterproof Smart Watch WITH AMAZING FUNCTIONS',
 'Oraimo FreePods-3 E104D 2Baba Edition BT 5.2 Wireless Earbuds',
 'Oraimo Tempo-S IP67 Waterproof Smart Watch',
 'Oraimo 2020 Edition Tempo S - OSW-11 Multi Function Smart Watch',
 'Oraimo 10000mAh Ultra Slim Fast Charging Power Bank',
 'Oraimo 20000MAH Powerbank - Power For Days']
  • Related