In the below html elements, I have been unsuccessful using beautiful soup.select
to only obtain the first child after div > (i.e. -11.94M and 2.30M) in list format
<div >
<div >
<div>−11.94M</div>
<div >−119.94%</div></div></div>
<div >
<div >
<div>2.30M</div>
<div >−80.17%</div></div></div>
Above is just two examples within the html I'm attempting to scrape within the dynamic javascript coded table which the above source code lies within, but there are many more div attributes on the page, and many more div class "wrap-25PNPwRV" inside the javascript table
I currently have the below code which allows me to scrape all the contents within div class ="wrap-25PNPwRV"
data_list = [elem.get_text() for elem in soup.select("div.wrap-25PNPwRV")]
Output:
['-11.94M', '-119.94%', '2.30M', '-80.17%']
However, I would like to use soup.select
to yield the desired output :
['-11.94M', '2.30M']
I tried following this guide https://www.crummy.com/software/BeautifulSoup/bs4/doc/ but have been unsuccessful to implement it to my above code.
Please note, if soup.select
is not possible to perform the above, I am happy to use an alternative providing it generates the same list format/output
CodePudding user response:
You can use the :nth-of-type
CSS selector:
data_list = [elem.get_text() for elem in soup.select(".wrap-25PNPwRV div:nth-of-type(1)")]
CodePudding user response:
I'd suggest to not use the .wrap-25PNPwRV
class. Seems random and almost certainly will change in the future.
Instead, select the <div>
element which has other element with as sibling. For example
print([t.text.strip() for t in soup.select('div:has( [class^="change"])')])
Prints:
['−11.94M', '2.30M']