Home > Mobile >  R, scraping html table with rvest only produces NA?
R, scraping html table with rvest only produces NA?

Time:12-14

I'm trying to scrape a single table but the results are only NA values. It looks like the table is present but the values are not.

library(rvest)
library(xml2)
library(dplyr)


file<-read_html("https://coinmarketcap.com/historical/20141026/")
tables<-html_nodes(file, "table")

html_table(tables[1], fill = TRUE)
html_table(tables[2], fill = TRUE)
html_table(tables[3], fill = TRUE)

Here is another approach but this too also gives NA

content = read_html("https://coinmarketcap.com/historical/20141026/")
content %>% html_table(fill=TRUE)

edit: the best I could do is have the content as a flat vector

content = read_html("https://coinmarketcap.com/historical/20141026/")
data = content %>% html_nodes("table") %>% html_nodes("tr") %>% html_nodes("td") %>% html_nodes("div") %>% html_text() 

CodePudding user response:

Your object tables is a nodeset or a list of multiple table nodes.

Using [[ instead of [ you could get the results from one table

library(rvest)

html <- read_html("https://coinmarketcap.com/historical/20141026/")
tables <- html_nodes(html, "table")

html_table(tables[[3]], fill = TRUE)
#> # A tibble: 200 × 1,001
#>     Rank Name   Symbol `Market Cap` Price `Circulating Su… `Volume (24h)` `% 1h`
#>    <int> <chr>  <chr>  <chr>        <chr> <chr>            <chr>          <chr> 
#>  1     1 BTCBi… BTC    $4,763,038,… $354… 13,428,200 BTC   $11,272,499.00 0.17% 
#>  2     2 XRPXRP XRP    $131,839,53… $0.0… 28,989,252,282 … $440,678.69    -0.08%
#>  3     3 LTCLi… LTC    $123,956,85… $3.73 33,257,506 LTC   $1,986,589.00  0.44% 
#>  4     4 BTSBi… BTS    $45,404,814… $0.0… 1,999,883,512 B… $148,776.44    -0.28%
#>  5     5 DOGED… DOGE   $23,574,314… $0.0… 94,718,507,527 … $348,436.34    -0.08%
#>  6     6 NXTNxt NXT    $21,746,605… $0.0… 999,997,096 NXT… $22,667.90     0.94% 
#>  7     7 PPCPe… PPC    $18,989,637… $0.8… 21,831,976 PPC   $42,349.30     0.35% 
#>  8     8 MAIDM… MAID   $9,612,290.… $0.0… 452,552,412 MAI… $8,843.16      -0.03%
#>  9     9 XCPCo… XCP    $9,327,497.… $3.52 2,647,329 XCP *  $3,048.65      -0.71%
#> 10    10 NMCNa… NMC    $9,283,079.… $0.9… 10,137,800 NMC   $21,431.33     0.03% 
#> # … with 190 more rows, and 993 more variables: % 24h <chr>, % 7d <chr>,
#> #    <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,
#> #    <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,
#> #    <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,
#> #    <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,
#> #    <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,
#> #    <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>, …

or you could use e.g. lapply to get all tables:

lapply(tables, html_table, fill = TRUE)
#> [[1]]
#> # A tibble: 0 × 11
#> # … with 11 variables: Rank <lgl>, Name <lgl>, Symbol <lgl>, Market Cap <lgl>,
#> #   Price <lgl>, Circulating Supply <lgl>, Volume (24h) <lgl>, % 1h <lgl>,
#> #   % 24h <lgl>, % 7d <lgl>,  <lgl>
#> 
#> [[2]]
#> # A tibble: 0 × 2
#> # … with 2 variables: Rank <lgl>, Name <lgl>
#> 
#> [[3]]
#> # A tibble: 200 × 1,001
#>     Rank Name   Symbol `Market Cap` Price `Circulating Su… `Volume (24h)` `% 1h`
#>    <int> <chr>  <chr>  <chr>        <chr> <chr>            <chr>          <chr> 
#>  1     1 BTCBi… BTC    $4,763,038,… $354… 13,428,200 BTC   $11,272,499.00 0.17% 
#>  2     2 XRPXRP XRP    $131,839,53… $0.0… 28,989,252,282 … $440,678.69    -0.08%
#>  3     3 LTCLi… LTC    $123,956,85… $3.73 33,257,506 LTC   $1,986,589.00  0.44% 
#>  4     4 BTSBi… BTS    $45,404,814… $0.0… 1,999,883,512 B… $148,776.44    -0.28%
#>  5     5 DOGED… DOGE   $23,574,314… $0.0… 94,718,507,527 … $348,436.34    -0.08%
#>  6     6 NXTNxt NXT    $21,746,605… $0.0… 999,997,096 NXT… $22,667.90     0.94% 
#>  7     7 PPCPe… PPC    $18,989,637… $0.8… 21,831,976 PPC   $42,349.30     0.35% 
#>  8     8 MAIDM… MAID   $9,612,290.… $0.0… 452,552,412 MAI… $8,843.16      -0.03%
#>  9     9 XCPCo… XCP    $9,327,497.… $3.52 2,647,329 XCP *  $3,048.65      -0.71%
#> 10    10 NMCNa… NMC    $9,283,079.… $0.9… 10,137,800 NMC   $21,431.33     0.03% 
#> # … with 190 more rows, and 993 more variables: % 24h <chr>, % 7d <chr>,
#> #    <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,
#> #    <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,
#> #    <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,
#> #    <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,
#> #    <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,
#> #    <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>,  <lgl>, …
  • Related