In a HTML file, I have a tag that includes <source type="audio/mpeg" src="/us/media
in that, and extract src
element from that using bs4?
CodePudding user response:
Here is the desired output:
from bs4 import BeautifulSoup
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148'}
res = requests.get('https://dictionary.cambridge.org/us/dictionary/english/vulnerable', headers = headers)
soup = BeautifulSoup(res.content, 'html.parser')
srcs = soup.select('source[src*="us/media"]')
for src in srcs:
try:
print(src['src'])
except:
pass
Output:
/us/media/english/us_pron/v/vul/vulne/vulnerable.mp3
/us/media/english/us_pron_ogg/v/vul/vulne/vulnerable.ogg
/us/media/english/uk_pron/u/ukv/ukvor/ukvorte027.mp3
/us/media/english/uk_pron_ogg/u/ukv/ukvor/ukvorte027.ogg
/us/media/english/us_pron/v/vul/vulne/vulnerable.mp3
/us/media/english/us_pron_ogg/v/vul/vulne/vulnerable.ogg
/us/media/english/us_pron/e/eus/eus74/eus74904.mp3
/us/media/english/us_pron_ogg/e/eus/eus74/eus74904.ogg