Home > Mobile >  In python, how would I select a HTML node with a changing ID number
In python, how would I select a HTML node with a changing ID number

Time:07-12

I'm writing a webscraper to scrape the data of user profiles from a specific website, using Python, BS4, Selenium. I'm trying to scrape data from a particular section of the site - the particular sections node has no unique identifying features from other section nodes, other than an ID number preceded by the word 'ember', like so:

<section id="ember31" >

The section ID can be "ember" followed by two OR three numbers - these numbers randomise everytime the page is loaded. There are multiple of these throughout the page - but I only wish to select one.

This is fine for scraping one profile, but how would I ensure that my code selects the correct node each time it runs through a new profile?

Thanks in advance.

CodePudding user response:

Try to select both the id and the class attributes

//*[@ and @id="ember31"]

CodePudding user response:

If the pattern is id starts with ember and you use BeautifulSoup try the css selectors:

soup.select_one('[id^=ember].artdeco-card.ember-view.pv-top-card')
Example

Used select instead of select_one for demonstration, that it only select the specific one.

from bs4 import BeautifulSoup
html='''
<section id="ember31" >section #1</section>
<section id="ember4711" >section #2</section>
<section id="embar123" >section #3</section>
<section id="notember123" >section #4</section>
'''

soup = BeautifulSoup(html)
soup.select('[id^=ember].artdeco-card.ember-view.pv-top-card')
Output
[<section  id="ember31">section #1</section>]
  • Related