This is actually quite easy if you know about HTML (I do not). If you look at this website, you will see how to extract data from billboard.com with respect to a specific week. An example can be seen as:
hot100page <- 'https://www.billboard.com/charts/hot-100/2022-03-19/'
hot100 <- xml2::read_html(hot100page)
# Extarct Rank of the song
rank <- hot100 %>%
rvest::html_nodes('body') %>%
xml2::xml_find_all("//span[contains(@class, 'chart-element__rank__number')]") %>%
rvest::html_text()
When I run this, the rank is NULL and the list with hot100 does not have any information. Can you please assist?
CodePudding user response:
We can get the rank by using class
only using rvest
by,
library(rvest)
hot100page <- 'https://www.billboard.com/charts/hot-100/2022-03-19/'
hot100 <- rvest::read_html(hot100page)
hot100 %>% html_nodes('.o-chart-results-list-row-container') %>% html_nodes('.a-font-primary-bold-l') %>%
html_text2()
[1] "1" "1" "1" "60" "1" "1" "60" "2" "3" "4" "5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" "17" "18" "19"
[26] "20" "21" "22" "23" "24" "25" "26" "27" "28" "29" "30" "31" "32" "33" "34" "35" "36" "37" "38" "39" "40" "41" "42" "43" "44"
[51] "45" "46" "47" "48" "49" "50" "51" "52" "53" "54" "55" "56" "57" "58" "59" "60" "61" "62" "63" "64" "65" "66" "67" "68" "69"
[76] "70" "71" "72" "73" "74" "75" "76" "77" "78" "79" "80" "81" "82" "83" "84" "85" "86" "87" "88" "89" "90" "91" "92" "93" "94"
[101] "95" "96" "97" "98" "99" "100"
CodePudding user response:
I would call tibble on the rows and map out title with rank as follows. Your xpath doesn't appear to match anything on the webpage and the specified class does not appear to exist in return html. You can cleanly extract rank from data-detail-target
attribute in each listing row; title can easily come from elements with class c-title
amongst the multi-valued className:
library(dplyr)
library(rvest)
hot100page <- "https://www.billboard.com/charts/hot-100/2022-03-19/"
hot100 <- read_html(hot100page)
rows <- hot100 %>% html_elements(".chart-results-list .o-chart-results-list-row")
result <- tibble(
rank = rows %>% html_attr("data-detail-target") %>% as.integer(),
title = rows %>% html_elements(".c-title") %>% html_text(trim = T)
)