The following is the nested tag I would like to proceed with
<h5><span >31.8 萬</span>2014 NISSAN MARCH</h5>
And here is my successful attempt to extract price unit.
price = i.find("span", attrs = {"class" : "price"})
However, when i tried
name = i.find("h5").span.find_next_sibling(text=True)
it says 'NoneType' object has no attribute 'find_next_sibling'. I hope there is a solution that is similar to my successful attempt. Thank you. ; )
Edit: The following is my complete code.
def get_basic_info(content_list):
basic_info = []
for item in content_list:
basic_info.append(item.find_all('h5'))
return basic_info
names = []
def get_names(basic_info):
for item in basic_info:
for i in item:
name = i.find("span", attrs = {"class" : "price"}).find_next_sibling()
if name:
names.append(name.text)
return(names)
for page in range(1,18):
base_url = "https://www.easycar.tw/carList.php?Action=search&show=col&lifting=desc&year=&year1=&page=" str(page)
response = get(base_url, headers=headers)
html_soup = BeautifulSoup(response.text, 'html.parser')
content_list = html_soup.find_all('div', attrs={'class': 'caption'})
basic_info = get_basic_info(content_list)
names = get_names(basic_info)
CodePudding user response:
i.find("h5").span.find_next_sibling(text=True)
should give you '2014 NISSAN MARCH' but you are not getting the correct h5
I am sure. You should try printing out i.find("h5")
to see if its the correct heading you want. Alternatively, you can get the answer you want by
soup.find("span", attrs = {"class" : "price"}).find_next_sibling(text=True)
Edit after question was updated:
Since the i
in loop is already a h5
Tag
object,
we don't have to find it. Here is the updated code that works
i.span.find_next_sibling(text=True)
CodePudding user response:
There are different ways to get your goal - Selecting the correct <h5>
you could dive a bit deeper with this answer: How to scrape last string of <p> tag element?
Following your example iterate all <span>
of your selection - Used css selectors
here to be more specific:
for e in soup.select('h5 span.price'):
print(e.next_sibling.strip())
Example
import requests
from bs4 import BeautifulSoup
base_url = 'https://www.easycar.tw/carList.php?Action=search&show=col&lifting=desc&year=&year1=&page=1'
soup = BeautifulSoup(requests.get(base_url).text)
for e in soup.select('h5 span.price'):
print(e.next_sibling.strip())
### or getting only car names
print(e.next_sibling.strip().split(' ', 1)[-1])
Output
2020 TOYOTA ALTIS
2019 NISSAN LIVINA
2019 TOYOTA VIOS
2016 HONDA CITY
2017 HONDA HR-V
2018 TOYOTA YARIS
...
or
TOYOTA ALTIS
NISSAN LIVINA
TOYOTA VIOS
HONDA CITY
HONDA HR-V
...