I am trying to retrieve all checkboxes with only edit in the data-reactid on a form.
Here is the html:
<input data-reactid=".0.1.2.0.0.0.0.$=2fields.0.2.0.0.1.0:$90.0:$=2edit.0.0.0" type="checkbox"/>
<input data-reactid=".0.1.2.0.0.0.0.$=2fields.0.2.0.0.1.0:$100.0:$=2edit.0.0.0" type="checkbox"/>
<input data-reactid=".0.1.2.0.0.0.0.$=2fields.0.2.0.0.1.0:$110.0:$=2default.0.0.2.0.0" type="checkbox"/>
<input data-reactid=".0.1.2.0.0.0.0.$=2fields.0.2.0.0.1.0:$110.0:$=2edit.0.0.0" type="checkbox"/>
...
Below is the code I've used to filter the html, but I'm also getting other checkboxes on the form. How do I remove default data-reactid's from my selection?
chkbox = soup.findAll('input', attrs={"type":"checkbox"})
for chk in chkbox:
print(chk)
CodePudding user response:
You can use a regex when filtering by attrs
, so you could select elements that contains "edit" in the data-reactid
attribute.
soup.findAll("input", attrs={"data-reactid": re.compile(r"edit")})
CodePudding user response:
You can use an attribute = value css selector with * contains operator
from bs4 import BeautifulSoup as bs
html = '''
<input data-reactid=".0.1.2.0.0.0.0.$=2fields.0.2.0.0.1.0:$90.0:$=2edit.0.0.0" type="checkbox"/>
<input data-reactid=".0.1.2.0.0.0.0.$=2fields.0.2.0.0.1.0:$100.0:$=2edit.0.0.0" type="checkbox"/>
<input data-reactid=".0.1.2.0.0.0.0.$=2fields.0.2.0.0.1.0:$110.0:$=2default.0.0.2.0.0" type="checkbox"/>
<input data-reactid=".0.1.2.0.0.0.0.$=2fields.0.2.0.0.1.0:$110.0:$=2edit.0.0.0" type="checkbox"/>'''
soup = bs(html, 'lxml')
soup.select('input[data-reactid*=edit][type=checkbox]')