Home > Mobile >  find a tag by beautifulsoup and extract element
find a tag by beautifulsoup and extract element

Time:11-05

In a HTML file, I have a tag that includes <source type="audio/mpeg" src="/us/media in that, and extract src element from that using bs4?

CodePudding user response:

Here is the desired output:

from bs4 import BeautifulSoup
import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148'}


res = requests.get('https://dictionary.cambridge.org/us/dictionary/english/vulnerable', headers = headers)
soup = BeautifulSoup(res.content, 'html.parser')
srcs = soup.select('source[src*="us/media"]')
for src in srcs:
    try:
        print(src['src'])
    except:
        pass

Output:

/us/media/english/us_pron/v/vul/vulne/vulnerable.mp3
/us/media/english/us_pron_ogg/v/vul/vulne/vulnerable.ogg
/us/media/english/uk_pron/u/ukv/ukvor/ukvorte027.mp3    
/us/media/english/uk_pron_ogg/u/ukv/ukvor/ukvorte027.ogg
/us/media/english/us_pron/v/vul/vulne/vulnerable.mp3    
/us/media/english/us_pron_ogg/v/vul/vulne/vulnerable.ogg
/us/media/english/us_pron/e/eus/eus74/eus74904.mp3
/us/media/english/us_pron_ogg/e/eus/eus74/eus74904.ogg
  • Related