I already managed to get a section of the website I wanted. But, without the resources (audio).
wget -q -O - "https://dictionary.cambridge.org/dictionary/english/admirable" | xmllint --html --xpath '//div[@class = "pos-header dpos-h"]' - 2>/dev/null > admirable-wget
This is the section of the website,
How can I include it in a path or something? I would like to play it with mpv, latter in the script I'm building.
CodePudding user response:
Get the path to the media file with this XPath expression:
string(//amp-audio[@id="ampaudio1"]/source[@type="audio/ogg"]/@src)
Full command
wget -q -O - "https://dictionary.cambridge.org/dictionary/english/admirable" | xmllint --recover --html --xpath 'string(//amp-audio[@id="ampaudio1"]/source[@type="audio/ogg"]/@src)'
Result
/media/english/uk_pron_ogg/u/uka/ukadj/ukadjus011.ogg
Then download it
wget -q "https://dictionary.cambridge.org/media/english/uk_pron_ogg/u/uka/ukadj/ukadjus011.ogg"
Note: check site's terms of use