Home > front end >  How can I GET youtube video title with httr request?
How can I GET youtube video title with httr request?

Time:10-10

I'm not a programer so I might be tripping onto something really silly to solve.

I'm trying to get the title of multiple youtube videos for my research. I recently found the httr package, and I think the GET function reaches this info really well, the problem is that I don't know how to access the response.

I tried

x <- GET("https://www.youtube.com/watch?v=2lAe1cqCOXo")

content(x)

and it gave me this response

{html_document}
<html style="font-size: 10px;font-family: Roboto, Arial, sans-serif;" lang="pt-BR" system-icons="" typography="" typography-spacing="">
[1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\n<meta http ...
[2] <body>\n<div id="watch7-content"  itemscope itemid="" itemtype="h ...

I know that every video title is in [1]<head'> part as:

<title= TITLE OF THE VIDEO - YOUTUBE </'title>

or as

<'meta name="title" content=" TITLE OF THE VIDEO ">

Is that a way to browse the content response to extract this information?

CodePudding user response:

Here is another approach that can be considered :

library(rvest)
library(stringr)

vector_URL_Title_To_Extract <- c("https://www.youtube.com/watch?v=DyX5RFSxWOY",
                                 "https://www.youtube.com/watch?v=2lAe1cqCOXo",
                                 "https://www.youtube.com/watch?v=ndTktsXlN7w")

nb_URL <- length(vector_URL_Title_To_Extract)
list_URL_Names <- list()

for(i in 1 : nb_URL)
{
  print(i)
  webpage <-  read_html(vector_URL_Title_To_Extract[i])
  webpage_Text <- html_text(webpage)
  title <- stringr::str_extract_all(webpage_Text, "(;[^;]*#YouTubeRewind - YouTube)|(;[^;]*- YouTube\\{)")[[1]]
  list_URL_Names[[i]] <- title
}

list_URL_Names

[[1]]
[1] ";CAQ celebrates victory in Quebec City, as opposition absorbs loss in Montreal - YouTube{"

[[2]]
[1] ";YouTube Rewind 2019: For the Record | #YouTubeRewind - YouTube"

[[3]]
[1] ";Crazy Capoeira Master Setting the UFC on Fire - Michel Pereira - YouTube{"

CodePudding user response:

The best way to do this would probably be using YouTube's API designed for this purpose

But if you know the format of the youtube page's HTML you could probably just represent the whole HTML document as a string and use a string function to find the index of where the title tag is located, and isolate it that way. Or you could use a HTML parser to parse the HTML and find the data.

CodePudding user response:

Here is an approach that can be considered :

library(RSelenium)
port <- as.integer(4444L   rpois(lambda = 1000, 1))
rd <- rsDriver(chromever = "105.0.5195.52", browser = "chrome", port = port)
remDr <- rd$client

vector_URL_Title_To_Extract <- c("https://www.youtube.com/watch?v=DyX5RFSxWOY",
                                 "https://www.youtube.com/watch?v=2lAe1cqCOXo",
                                 "https://www.youtube.com/watch?v=ndTktsXlN7w")

nb_URL <- length(vector_URL_Title_To_Extract)
list_URL_Names <- list()

for(i in 1 : nb_URL)
{
  print(i)
  remDr$navigate(vector_URL_Title_To_Extract[i])
  Sys.sleep(5)
  web_Obj <- remDr$findElement("css selector", '#title > h1 > yt-formatted-string')
  list_URL_Names[[i]] <- web_Obj$getElementText()
}

list_URL_Names

[[1]]
[[1]][[1]]
[1] "CAQ celebrates victory in Quebec City, as opposition absorbs loss in Montreal"


[[2]]
[[2]][[1]]
[1] "YouTube Rewind 2019: For the Record | #YouTubeRewind"


[[3]]
[[3]][[1]]
[1] "Crazy Capoeira Master Setting the UFC on Fire - Michel Pereira"
  •  Tags:  
  • r
  • Related