Home > Software engineering >  Reading 'raw' xlsx file from httr response into data frame
Reading 'raw' xlsx file from httr response into data frame

Time:03-26

I am using the httr package to retrieve data from our reporting system using its REST API. I am specifying the content to be a xlsx. The response contains the raw (binary?) file.

Here's what my request looks like:

request = GET("http://server/.../documents/123456", 
             add_headers(.headers = c('Accept'= 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet', 
                                      'authtoken' = paste0('', logonToken,''))) ,
             content_type("application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"),
             encode = 'raw'
)

content(request)
[1] 50 4b 03 04 0a 00 08 08 08 00 19 88 79 54 00 00 00 00 00 00 00
[44] 2f 73 68 65 65 74 31 2e 78 6d 6c a5 57 5b 6f 9b 30 14 7e 9f b4
[87] 00 23 43 2f db af 9f c1 94 d8 c6 58 93 92 87 54 f5 77 f1 39 fe 
     ... etc

The result can be saved as a .xlsx and opened in Excel. However, I would like to read this data directly into a data frame. Is there a way to mimic the result into a readable input within the same script?

I am able to pass an extra parameter write_disk to save the response directly as a file. Specifying a path is required. I tried testing with tempfile() to write and read the response directly back in, but wasn't able to get it to work.

Is there any way to read a raw file from an R environment object?

CodePudding user response:

Yes, here's a fully reproducible example url:

url <- paste0('https://file-examples.com/storage/fe91183158623ded19eb446/',
              '2017/02/file_example_XLSX_100.xlsx')

Now download our file an get its raw contents:

raw_xlsx <- httr::GET(url)$content

Let's create a temporary file to store it:

tmp <- tempfile(fileext = '.xlsx')

Now write the raw data to the file:

writeBin(raw_xlsx, tmp)

Our excel file is now saved in the temporary file, which we can read however you would normally read them into R:

my_excel <- readxl::read_excel(tmp)

And the result is:

my_excel
#> # A tibble: 100 x 8
#>      `0` `First Name` `Last Name` Gender Country         Age Date          Id
#>    <dbl> <chr>        <chr>       <chr>  <chr>         <dbl> <chr>      <dbl>
#>  1     1 Dulce        Abril       Female United States    32 15/10/2017  1562
#>  2     2 Mara         Hashimoto   Female Great Britain    25 16/08/2016  1582
#>  3     3 Philip       Gent        Male   France           36 21/05/2015  2587
#>  4     4 Kathleen     Hanner      Female United States    25 15/10/2017  3549
#>  5     5 Nereida      Magwood     Female United States    58 16/08/2016  2468
#>  6     6 Gaston       Brumm       Male   United States    24 21/05/2015  2554
#>  7     7 Etta         Hurn        Female Great Britain    56 15/10/2017  3598
#>  8     8 Earlean      Melgar      Female United States    27 16/08/2016  2456
#>  9     9 Vincenza     Weiland     Female United States    40 21/05/2015  6548
#> 10    10 Fallon       Winward     Female Great Britain    28 16/08/2016  5486
#> # ... with 90 more rows
  • Related