Home > front end >  BeautifulSoup How to exclude elements from HTML
BeautifulSoup How to exclude elements from HTML

Time:01-23

I am trying to retrieve all checkboxes with only edit in the data-reactid on a form.

Here is the html:

<input data-reactid=".0.1.2.0.0.0.0.$=2fields.0.2.0.0.1.0:$90.0:$=2edit.0.0.0" type="checkbox"/>
<input data-reactid=".0.1.2.0.0.0.0.$=2fields.0.2.0.0.1.0:$100.0:$=2edit.0.0.0" type="checkbox"/>
<input data-reactid=".0.1.2.0.0.0.0.$=2fields.0.2.0.0.1.0:$110.0:$=2default.0.0.2.0.0" type="checkbox"/>
<input data-reactid=".0.1.2.0.0.0.0.$=2fields.0.2.0.0.1.0:$110.0:$=2edit.0.0.0" type="checkbox"/>
...

Below is the code I've used to filter the html, but I'm also getting other checkboxes on the form. How do I remove default data-reactid's from my selection?

chkbox = soup.findAll('input', attrs={"type":"checkbox"})
    for chk in chkbox:
        print(chk)

CodePudding user response:

You can use a regex when filtering by attrs, so you could select elements that contains "edit" in the data-reactid attribute.

soup.findAll("input", attrs={"data-reactid": re.compile(r"edit")})

CodePudding user response:

You can use an attribute = value css selector with * contains operator

from bs4 import BeautifulSoup as bs

html = '''
<input data-reactid=".0.1.2.0.0.0.0.$=2fields.0.2.0.0.1.0:$90.0:$=2edit.0.0.0" type="checkbox"/>
<input data-reactid=".0.1.2.0.0.0.0.$=2fields.0.2.0.0.1.0:$100.0:$=2edit.0.0.0" type="checkbox"/>
<input data-reactid=".0.1.2.0.0.0.0.$=2fields.0.2.0.0.1.0:$110.0:$=2default.0.0.2.0.0" type="checkbox"/>
<input data-reactid=".0.1.2.0.0.0.0.$=2fields.0.2.0.0.1.0:$110.0:$=2edit.0.0.0" type="checkbox"/>'''

soup = bs(html, 'lxml')
soup.select('input[data-reactid*=edit][type=checkbox]')
  •  Tags:  
  • Related