Home > Enterprise >  How come the difference between 'View page source' and document.querySelector("html&q
How come the difference between 'View page source' and document.querySelector("html&q

Time:05-27

I want to extract subtitles from this YouTube page (link).
I found timedtext, when looking via 'View page source'.

But not when I search via javascript console. It won't find it:

document.querySelector("html").innerHTML.match("timedtext")

But for this other YouTube page, it does actually work both.

How come the difference and how to fix it?

CodePudding user response:

As I commented, if you want to extract the subtitles using this way, consider instead search for the script tag that has the ytInitialData variable = that's the one that has the url of the timedtext.

I can't tell the difference, but, I assume the javascript code injects the HTML code once the page is loaded.

After pasting the line you share in your comment:

ytInitialPlayerResponse.captions.playerCaptionsTracklistRenderer.captionTracks

I got the timedtexts in the available languages. Keep in mind, though, probaly not all videos has auto-generated captions - example

In that example, I didn't get the captions, so, I don't think that inspecting the source code of the page works for all videos.

  • Related