Home > database >  How to read tables created with the DT R package in an html file as tibbles?
How to read tables created with the DT R package in an html file as tibbles?

Time:10-15

I have a bunch of html files that were generated from a computationally intensive parametrized RMarkdown template that took hours per run. When browsing the rendered html files I realized that I need the data presented as DT tables for subsequent analyses. One thing I could do is to add export statements in the template to save the data being displayed as DT tables and rerun the templates. But doing so will need a lot of compute resources and time!

Alternatively, what I am thinking is if I could extract DT tables as tibbles from the html files that would save me unnecessary runs. I tried to do this, but haven't been successful yet. Here's how you can recreate what I have done so far:

  1. Create a demo.Rmd file in the root folder with the following contents:
---
title: "Demo for Stack Overflow"
output: html_document
---
 
# Setup
 
```{r message = FALSE}
library(DT)
```
 
# Display data
 
## mtcars
 
```{r}
datatable(mtcars)
```
 
## iris
 
```{r}
datatable(iris)
```
  1. Knit demo.Rmd file using the Knit button in RStudio and generate the html file
  2. Add the following code to read DT tables in the demo.Rmd file:
dt_tables <- rvest::read_html("demo.html") |> 
  rvest::html_node("#display-data") |> 
  rvest::html_nodes("script")
  1. Print dt_tables
> dt_tables
{xml_nodeset (2)}
[1] <script type="application/json" data-for="htmlwidget-7c1371d3ec409f692ab5">{"x": 
{"filter":"none","vertical ...
[2] <script type="application/json" data-for="htmlwidget-9c0e7f3134e5a814c9d0">{"x": 
{"filter":"none","vertical ...

I couldn't figure out the next steps to read the table contents as list of tibbles! So, the output would look something like:

list(
  mtcars = "mtcars DT table data as tibble",
  iris = "iris DT table data as tibble"
)

CodePudding user response:

I figured out a way to extract the table using Selenium and rvest

library(RSelenium)
library(rvest)


# print chrome versions
print(binman::list_versions("chromedriver"))

# Use your version
driver = RSelenium::rsDriver(chromever="106.0.5249.21")

driver$client$navigate("demo.html")
dtable<-driver$client$findElement(using="id", "DataTables_Table_0")
outerHTML <- dtable$getElementAttribute("outerHTML")[[1]]

doc<- rvest::read_html(paste0("<html><head></head><body>",outerHTML,"</body></html>"))
table <- rvest::html_element(doc, "table")

driver$server$stop()

html_table(table)

# A tibble: 10 × 12
# 
#   ``                  mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#    <chr>             <dbl> <int> <dbl> <int> <dbl> <dbl> <dbl> <int> <int> <int> <int>
#  1 Mazda RX4          21       6  160    110  3.9   2.62  16.5     0     1     4     4
#  2 Mazda RX4 Wag      21       6  160    110  3.9   2.88  17.0     0     1     4     4
#  3 Datsun 710         22.8     4  108     93  3.85  2.32  18.6     1     1     4     1
#  4 Hornet 4 Drive     21.4     6  258    110  3.08  3.22  19.4     1     0     3     1
#  5 Hornet Sportabout  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#  6 Valiant            18.1     6  225    105  2.76  3.46  20.2     1     0     3     1
#  7 Duster 360         14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#  8 Merc 240D          24.4     4  147.    62  3.69  3.19  20       1     0     4     2
#  9 Merc 230           22.8     4  141.    95  3.92  3.15  22.9     1     0     4     2
# 10 Merc 280           19.2     6  168.   123  3.92  3.44  18.3     1     0     4     4

  • Related