Finding all div elements with varying id value with BeautifulSoup-CodePudding

This question must be a duplicate, but for the sake of it, I can't find it anywhere.

html = """
<html>
<head>
</head>
<body>
<div id="7471292"></div>
<div id="5235252"></div>
<div href="/some/link/"></div>
<div id="7567327"></div>
<div id="1231312"></div>
<div </div>
<div id="2342424"></div>
</body>
</html>
"""

#Create soup from html
soup = BeautifulSoup(html)

I want the following output:

[<div id="7471292"></div>,
 <div id="5235252"></div>,
 <div id="7567327"></div>,
 <div id="1231312"></div>,
 <div id="2342424"></div>]

We can do something like:

soup.find_all("div")

but this will return all divs. If we want to specify an id attractor, we have to fill in a concise value as well, seemingly rendering it useless:

soup.find_all('div', {'id': ""})

CodePudding user response：

You can pass in a lambda function that checks whether the id contains only contains numbers. A regular expression is overkill here.

soup = BeautifulSoup(html)
print(soup.find_all("div", id=lambda x: x is not None and x.isnumeric()))

This outputs:

[<div id="7471292"></div>, <div id="5235252"></div>,
<div id="7567327"></div>, <div id="1231312"></div>, <div id="2342424"></div>]

CodePudding user response：

What you need is a combination of regex and soup:

from bs4 import BeautifulSoup
import re
html = """
<html>
<head>
</head>
<body>
<div id="7471292"></div>
<div id="5235252"></div>
<div href="/some/link/"></div>
<div id="7567327"></div>
<div id="1231312"></div>
<div </div>
<div id="2342424"></div>
</body>
</html>
"""

soup = BeautifulSoup(html)
soup.find_all('div', {'id': re.compile("\d ")})

Output

[<div id="7471292"></div>,
 <div id="5235252"></div>,
 <div id="7567327"></div>,
 <div id="1231312"></div>,
 <div id="2342424"></div>]

If you are interested in having the div tags whose id contains number, letters or combination of both, instead of using (\d ) try using ([\d\w] ).