Get the resources under a specific div (wget, xmllint, etc)-CodePudding

I already managed to get a section of the website I wanted. But, without the resources (audio).

wget -q -O - "https://dictionary.cambridge.org/dictionary/english/admirable" | xmllint --html --xpath '//div[@class = "pos-header dpos-h"]' - 2>/dev/null > admirable-wget

This is the section of the website,

How can I include it in a path or something? I would like to play it with mpv, latter in the script I'm building.

CodePudding user response：

Get the path to the media file with this XPath expression:

string(//amp-audio[@id="ampaudio1"]/source[@type="audio/ogg"]/@src)

Full command

wget -q -O - "https://dictionary.cambridge.org/dictionary/english/admirable" | xmllint --recover --html --xpath 'string(//amp-audio[@id="ampaudio1"]/source[@type="audio/ogg"]/@src)'

Result

/media/english/uk_pron_ogg/u/uka/ukadj/ukadjus011.ogg

Then download it

wget -q "https://dictionary.cambridge.org/media/english/uk_pron_ogg/u/uka/ukadj/ukadjus011.ogg"

Note: check site's terms of use