I want to get title
and content
of article: example web :https://facts.net/best-survival-movies/
I want to append all p
in h2[tcontent-title]
and the result expected is:
title=[title1, title2, title3]
content = [content1,content2,content3]
and append all p string to content1,and append all p string to content2,and append all p string to content3 can you help me.
CodePudding user response:
Solution from your last question is not working, cause there are some <p>
that are not siblings in the structure, they are nested in an <aside>
and the preceding-sibling
will fail.
You could switch to preceding
only, but this will grab also the <p>
from the <aside>
- To fix this, simply select the elements more specific:
driver.find_elements(By.CSS_SELECTOR,'.single-title-desc-wrap p:not(aside>p)')
Example
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get('https://facts.net/best-survival-movies/')
data = dict((e.text,'') for e in driver.find_elements(By.CSS_SELECTOR,'.single-title-desc-wrap h2'))
for p in driver.find_elements(By.CSS_SELECTOR,'.single-title-desc-wrap p:not(aside>p)'):
data[p.find_element(By.XPATH, './preceding-sibling::h2[1]').text] = data[p.find_element(By.XPATH, './preceding-sibling::h2[1]').text] ' ' p.text
[{'title':x,'content':y} for x,y in data.items()]