# Load Packages
pacman::p_load(tidyverse, rvest)
# Set URL
url <- "https://www.worldometers.info/coronavirus/"
website <- read_html(url)
# Scrape Cases Data
cases_html <- html_nodes(website, "td.sorting_1")
cases <- html_text(cases_html)
cases_html
cases
I am trying to scrape webdata with rvest, but am getting the following errors when I check my two variables here ("cases_html" and "cases"). The output for each respectively is:
> {xml_nodeset (0)}
> character(0)
I am not sure why I am getting no data scraped from this website. I have also tried using the RSelenium package like recommended in another post here, but that code also failed with an unrelated error. I figure the solution should be available within Rvest, however, and I would like to figure out what exactly is wrong here.
CodePudding user response:
It's not clear what you are trying to scrape from the page, but you can get the main data table like this:
library(tidyverse)
library(rvest)
read_html("https://www.worldometers.info/coronavirus/") %>%
html_nodes("#main_table_countries_today") %>%
html_table() %>%
pluck(1)
#> # A tibble: 244 x 22
#> `#` `Country,Other` TotalCases NewCases TotalDeaths NewDeaths
#> <int> <chr> <chr> <chr> <chr> <chr>
#> 1 NA "North America" 98,313,200 " 23,167" 1,459,752 " 147"
#> 2 NA "Asia" 147,921,193 " 130,458" 1,423,876 " 395"
#> 3 NA "South America" 56,801,380 " 17,492" 1,294,318 " 31"
#> 4 NA "Europe" 191,122,646 " 187,856" 1,817,850 " 587"
#> 5 NA "Oceania" 7,156,060 " 46,381" 10,626 " 59"
#> 6 NA "Africa" 11,902,057 " 6,666" 253,795 " 4"
#> 7 NA "" 721 "" 15 ""
#> 8 NA "World" 513,217,257 " 412,020" 6,260,232 " 1,223"
#> 9 1 "USA" 83,055,836 " 18,777" 1,020,749 " 89"
#> 10 2 "India" 43,079,157 " 3,293" 523,803 ""
#> # ... with 234 more rows, and 16 more variables: TotalRecovered <chr>,
#> # NewRecovered <chr>, ActiveCases <chr>, `Serious,Critical` <chr>,
#> # `Tot Cases/1M pop` <chr>, `Deaths/1M pop` <chr>, TotalTests <chr>,
#> # `Tests/1M pop` <chr>, Population <chr>, Continent <chr>,
#> # `1 Caseevery X ppl` <chr>, `1 Deathevery X ppl` <chr>,
#> # `1 Testevery X ppl` <int>, `New Cases/1M pop` <chr>,
#> # `New Deaths/1M pop` <dbl>, `Active Cases/1M pop` <chr>
Created on 2022-04-30 by the reprex package (v2.0.1)