I want to extract subtitles from this YouTube page (link).
I found timedtext, when looking via 'View page source'.
But not when I search via javascript console. It won't find it:
document.querySelector("html").innerHTML.match("timedtext")
But for this other YouTube page, it does actually work both.
How come the difference and how to fix it?
CodePudding user response:
As I commented, if you want to extract the subtitles using this way, consider instead search for the script tag that has the ytInitialData
variable = that's the one that has the url of the timedtext.
I can't tell the difference, but, I assume the javascript code injects the HTML code once the page is loaded.
After pasting the line you share in your comment:
ytInitialPlayerResponse.captions.playerCaptionsTracklistRenderer.captionTracks
I got the timedtexts in the available languages. Keep in mind, though, probaly not all videos has auto-generated captions - example
In that example, I didn't get the captions, so, I don't think that inspecting the source code of the page works for all videos.