I have a bunch of html files that were generated from a computationally intensive parametrized RMarkdown template that took hours per run. When browsing the rendered html files I realized that I need the data presented as DT tables for subsequent analyses. One thing I could do is to add export statements in the template to save the data being displayed as DT tables and rerun the templates. But doing so will need a lot of compute resources and time!
Alternatively, what I am thinking is if I could extract DT tables as tibbles from the html files that would save me unnecessary runs. I tried to do this, but haven't been successful yet. Here's how you can recreate what I have done so far:
- Create a
demo.Rmd
file in the root folder with the following contents:
---
title: "Demo for Stack Overflow"
output: html_document
---
# Setup
```{r message = FALSE}
library(DT)
```
# Display data
## mtcars
```{r}
datatable(mtcars)
```
## iris
```{r}
datatable(iris)
```
- Knit
demo.Rmd
file using theKnit
button in RStudio and generate the html file - Add the following code to read DT tables in the
demo.Rmd
file:
dt_tables <- rvest::read_html("demo.html") |>
rvest::html_node("#display-data") |>
rvest::html_nodes("script")
- Print
dt_tables
> dt_tables
{xml_nodeset (2)}
[1] <script type="application/json" data-for="htmlwidget-7c1371d3ec409f692ab5">{"x":
{"filter":"none","vertical ...
[2] <script type="application/json" data-for="htmlwidget-9c0e7f3134e5a814c9d0">{"x":
{"filter":"none","vertical ...
I couldn't figure out the next steps to read the table contents as list of tibbles! So, the output would look something like:
list(
mtcars = "mtcars DT table data as tibble",
iris = "iris DT table data as tibble"
)
CodePudding user response:
I figured out a way to extract the table using Selenium and rvest
library(RSelenium)
library(rvest)
# print chrome versions
print(binman::list_versions("chromedriver"))
# Use your version
driver = RSelenium::rsDriver(chromever="106.0.5249.21")
driver$client$navigate("demo.html")
dtable<-driver$client$findElement(using="id", "DataTables_Table_0")
outerHTML <- dtable$getElementAttribute("outerHTML")[[1]]
doc<- rvest::read_html(paste0("<html><head></head><body>",outerHTML,"</body></html>"))
table <- rvest::html_element(doc, "table")
driver$server$stop()
html_table(table)
# A tibble: 10 × 12
#
# `` mpg cyl disp hp drat wt qsec vs am gear carb
# <chr> <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
# 1 Mazda RX4 21 6 160 110 3.9 2.62 16.5 0 1 4 4
# 2 Mazda RX4 Wag 21 6 160 110 3.9 2.88 17.0 0 1 4 4
# 3 Datsun 710 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
# 4 Hornet 4 Drive 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1
# 5 Hornet Sportabout 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
# 6 Valiant 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1
# 7 Duster 360 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
# 8 Merc 240D 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
# 9 Merc 230 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
# 10 Merc 280 19.2 6 168. 123 3.92 3.44 18.3 1 0 4 4