Home > Blockchain >  Web scraping table R
Web scraping table R

Time:09-30

I'm trying to get the data from the rating column on this site https://www.ratingraph.com/tv-shows/one-piece-ratings-17673/, but I'm having problems with "{xml_nodeset (0)}".

my attempt:

library("rvest")
%>%` <- magrittr::`%>%`

page <- read_html("https://www.ratingraph.com/tv-shows/one-piece-ratings-17673/")
table <- page %>% 
  html_nodes("table") 
df <- table[2] %>% 
  html_table()

These are the data I want:

these are the data I want

CodePudding user response:

By inspecting the page and looking on the "Network" tab, you can see the call it makes to create the table. The response is in JSON, which is easily parsed into an R list. Much of this is probably unnecessary for your purpose, so you can shorten it. If you want more than 25 rows, increase the length=25, or take it out.

page <- httr::GET(
  paste0("https://www.ratingraph.com/show-episodes-list/17673/?draw=1&columns[0][data]=trend&",
         "columns[0][name]=&columns[0][searchable]=false&columns[0][orderable]=true&columns[0][search][value]=&columns[0][search][regex]=false&columns[1][data]=season&",
         "columns[1][name]=&columns[1][searchable]=false&columns[1][orderable]=true&columns[1][search][value]=&columns[1][search][regex]=false&columns[2][data]=episode&",
         "columns[2][name]=&columns[2][searchable]=false&columns[2][orderable]=true&columns[2][search][value]=&columns[2][search][regex]=false&columns[3][data]=name&",
         "columns[3][name]=&columns[3][searchable]=false&columns[3][orderable]=true&columns[3][search][value]=&columns[3][search][regex]=false&columns[4][data]=start&",
         "columns[4][name]=&columns[4][searchable]=false&columns[4][orderable]=true&columns[4][search][value]=&columns[4][search][regex]=false&columns[5][data]=total_votes&",
         "columns[5][name]=&columns[5][searchable]=false&columns[5][orderable]=true&columns[5][search][value]=&columns[5][search][regex]=false&columns[6][data]=average_rating&",
         "columns[6][name]=&columns[6][searchable]=false&columns[6][orderable]=true&columns[6][search][value]=&columns[6][search][regex]=false&order[0][column]=1&",
         "order[0][dir]=asc&order[1][column]=2&order[1][dir]=asc&start=0&length=25&search[value]=&search[regex]=false&_=", Sys.time() %>% as.numeric() %>% paste0("000")))
table <- page %>% httr::content(as = 'parsed')
avg_ratings <- sapply(table$data, `[[`, 'average_rating') %>% as.numeric()
  • Related