Find Tags that Match Specific Classes but one class keeps changing-CodePudding

I want to extract information from a div tag which has some specific classes.

Class are in the format of abc def jss238 xyz

Now, the jss class number keeps changing, so after some time ,the classes will become abc def jss384 xyz

What is the best way to extract information so that the code doesn't break if the tags change as well.

The current code that I using is

val = soup.findAll('div', class_="abc def jss328 xyz")

I feel Regex can be a good way, but can I also not use jss class and use the other 3 only to search?

CodePudding user response：

SO yes you can use regex to find the pattern that has abc def <pattern of 3 letters and 3 digits> xyz

Personally, I would see if you can get the data from the source. When classes change like that, it's usually because the page is rendered through javascript, but it needs to put the data in there and get it from somewhere. If you share the url and what data you are after, I could see if thats the case. But here's the regex version:

from bs4 import BeautifulSoup
import re

html = '''<div >jss238 text</div>
<div >jss384 text</div>
<div >doesn't match the pattern</div>'''


soup = BeautifulSoup(html, 'html.parser')

regex = re.compile('abc def \w{3}\d{3} xyz')
specialDivs = soup.find_all('div', {'class':regex})


for each in specialDivs:
    print(f'html: {each}\tText: {each.text}')

Output:

html: <div >jss238 text</div> Text: jss238 text
html: <div >jss384 text</div> Text: jss384 text