How to extract image url with python?-CodePudding

I'm trying to extract image URLs from this code:

<div  data-featured-src="https://websitedemos.net/wp-content/uploads/2019/07/outdoor-adventure-02-home.jpg" data-src="https://websitedemos.net/wp-content/uploads/2019/07/outdoor-adventure-02-home.jpg" style='background-image: url("https://websitedemos.net/wp-content/uploads/2019/07/outdoor-adventure-02-home.jpg");'></div>

How can I find the URLs in data-src?

I'm using beautiful soup and find function but I have no idea how to extract links because I don't see img tag as usual...

Thank you for your time in advance

CodePudding user response：

If you can't use an HTML parser for whatever reason, then you can use regex.

import re

text = '''
<div  data-featured-src="https://websitedemos.net/wp-content/uploads/2019/07/outdoor-adventure-02-home.jpg" data-src="https://websitedemos.net/wp-content/uploads/2019/07/outdoor-adventure-02-home.jpg" style='background-image: url("https://websitedemos.net/wp-content/uploads/2019/07/outdoor-adventure-02-home.jpg");'></div>
'''

parsed = re.search('(?<=data-src=").*(?=" )', text).group(0)

print(parsed)

CodePudding user response：

You can try the following:

from bs4 import BeautifulSoup

html = """
<div  data-featured-src="https://websitedemos.net/wp-content/uploads/2019/07/outdoor-adventure-02-home.jpg" data-src="https://websitedemos.net/wp-content/uploads/2019/07/outdoor-adventure-02-home.jpg" style='background-image: url("https://websitedemos.net/wp-content/uploads/2019/07/outdoor-adventure-02-home.jpg");'></div>
"""
soup = BeautifulSoup(html, "html.parser")
url = soup.select_one(
    "div.theme-screenshot.one.attachment-theme-screenshot.size-theme-screenshot.wp-post-image.loaded"
).get("data-src")

print(url)

This will return:

https://websitedemos.net/wp-content/uploads/2019/07/outdoor-adventure-02-home.jpg

Documentation for BeautifulSoup(bs4) can be found at:

https://www.crummy.com/software/BeautifulSoup/bs4/doc/