I'm trying to scrape a single table but the results are only NA values. It looks like the table is present but the values are not.
library(rvest)
library(xml2)
library(dplyr)
file<-read_html("https://coinmarketcap.com/historical/20141026/")
tables<-html_nodes(file, "table")
html_table(tables[1], fill = TRUE)
html_table(tables[2], fill = TRUE)
html_table(tables[3], fill = TRUE)
Here is another approach but this too also gives NA
content = read_html("https://coinmarketcap.com/historical/20141026/")
content %>% html_table(fill=TRUE)
edit: the best I could do is have the content as a flat vector
content = read_html("https://coinmarketcap.com/historical/20141026/")
data = content %>% html_nodes("table") %>% html_nodes("tr") %>% html_nodes("td") %>% html_nodes("div") %>% html_text()
CodePudding user response:
Your object tables
is a nodeset or a list of multiple table
nodes.
Using [[
instead of [
you could get the results from one table
library(rvest)
html <- read_html("https://coinmarketcap.com/historical/20141026/")
tables <- html_nodes(html, "table")
html_table(tables[[3]], fill = TRUE)
#> # A tibble: 200 × 1,001
#> Rank Name Symbol `Market Cap` Price `Circulating Su… `Volume (24h)` `% 1h`
#> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 BTCBi… BTC $4,763,038,… $354… 13,428,200 BTC $11,272,499.00 0.17%
#> 2 2 XRPXRP XRP $131,839,53… $0.0… 28,989,252,282 … $440,678.69 -0.08%
#> 3 3 LTCLi… LTC $123,956,85… $3.73 33,257,506 LTC $1,986,589.00 0.44%
#> 4 4 BTSBi… BTS $45,404,814… $0.0… 1,999,883,512 B… $148,776.44 -0.28%
#> 5 5 DOGED… DOGE $23,574,314… $0.0… 94,718,507,527 … $348,436.34 -0.08%
#> 6 6 NXTNxt NXT $21,746,605… $0.0… 999,997,096 NXT… $22,667.90 0.94%
#> 7 7 PPCPe… PPC $18,989,637… $0.8… 21,831,976 PPC $42,349.30 0.35%
#> 8 8 MAIDM… MAID $9,612,290.… $0.0… 452,552,412 MAI… $8,843.16 -0.03%
#> 9 9 XCPCo… XCP $9,327,497.… $3.52 2,647,329 XCP * $3,048.65 -0.71%
#> 10 10 NMCNa… NMC $9,283,079.… $0.9… 10,137,800 NMC $21,431.33 0.03%
#> # … with 190 more rows, and 993 more variables: % 24h <chr>, % 7d <chr>,
#> # <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>,
#> # <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>,
#> # <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>,
#> # <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>,
#> # <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>,
#> # <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, …
or you could use e.g. lapply
to get all tables:
lapply(tables, html_table, fill = TRUE)
#> [[1]]
#> # A tibble: 0 × 11
#> # … with 11 variables: Rank <lgl>, Name <lgl>, Symbol <lgl>, Market Cap <lgl>,
#> # Price <lgl>, Circulating Supply <lgl>, Volume (24h) <lgl>, % 1h <lgl>,
#> # % 24h <lgl>, % 7d <lgl>, <lgl>
#>
#> [[2]]
#> # A tibble: 0 × 2
#> # … with 2 variables: Rank <lgl>, Name <lgl>
#>
#> [[3]]
#> # A tibble: 200 × 1,001
#> Rank Name Symbol `Market Cap` Price `Circulating Su… `Volume (24h)` `% 1h`
#> <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 BTCBi… BTC $4,763,038,… $354… 13,428,200 BTC $11,272,499.00 0.17%
#> 2 2 XRPXRP XRP $131,839,53… $0.0… 28,989,252,282 … $440,678.69 -0.08%
#> 3 3 LTCLi… LTC $123,956,85… $3.73 33,257,506 LTC $1,986,589.00 0.44%
#> 4 4 BTSBi… BTS $45,404,814… $0.0… 1,999,883,512 B… $148,776.44 -0.28%
#> 5 5 DOGED… DOGE $23,574,314… $0.0… 94,718,507,527 … $348,436.34 -0.08%
#> 6 6 NXTNxt NXT $21,746,605… $0.0… 999,997,096 NXT… $22,667.90 0.94%
#> 7 7 PPCPe… PPC $18,989,637… $0.8… 21,831,976 PPC $42,349.30 0.35%
#> 8 8 MAIDM… MAID $9,612,290.… $0.0… 452,552,412 MAI… $8,843.16 -0.03%
#> 9 9 XCPCo… XCP $9,327,497.… $3.52 2,647,329 XCP * $3,048.65 -0.71%
#> 10 10 NMCNa… NMC $9,283,079.… $0.9… 10,137,800 NMC $21,431.33 0.03%
#> # … with 190 more rows, and 993 more variables: % 24h <chr>, % 7d <chr>,
#> # <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>,
#> # <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>,
#> # <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>,
#> # <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>,
#> # <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>,
#> # <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, <lgl>, …