Home > Enterprise >  R Cannot download a file from the web
R Cannot download a file from the web

Time:03-28

I can download in the browser a file from this website https://www.cmegroup.com/ftp/pub/settle/comex_future.csv

However when I try the following

url <- "https://www.cmegroup.com/ftp/pub/settle/comex_future.csv"

dest <- "C:\\COMEXfut.csv"

download.file(url, dest)

I get the following error message

Error in download.file(url, dest) : 
  cannot open URL 'https://www.cmegroup.com/ftp/pub/settle/comex_future.csv'
In addition: Warning message:
In download.file(url, dest) :
  InternetOpenUrl failed: 'The operation timed out'

even if I choose:

options(timeout = max(600, getOption("timeout")))

any idea why is this happening ? thanks !

CodePudding user response:

The problem here is that the site from which you are downloading needs a couple of additional headers. The easiest way to supply them is using the httr package

library(httr)

url <- "https://www.cmegroup.com/ftp/pub/settle/comex_future.csv"
UA <- paste('Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:98.0)',
            'Gecko/20100101 Firefox/98.0')

res <- GET(url, add_headers(`User-Agent` = UA, Connection = 'keep-alive'))

This should download in less than a second.

If you want to save the file you can do

writeBin(res$content, 'myfile.csv')

Or if you just want to read the data straight into R without even saving it, you can do:

content(res)
#> Rows: 527 Columns: 20                                                                 
#>  0s-- Column specification ----------------------------------------------------------------
#> Delimiter: ","
#> chr (10): PRODUCT SYMBOL, CONTRACT MONTH, CONTRACT DAY, CONTRACT, PRODUCT DESCRIPTIO...
#> dbl (10): CONTRACT YEAR, OPEN, HIGH, LOW, LAST, SETTLE, EST. VOL, PRIOR SETTLE, PRIO...
#> 
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 527 x 20
#>    `PRODUCT SYMBOL` `CONTRACT MONTH` `CONTRACT YEAR` `CONTRACT DAY` CONTRACT
#>    <chr>            <chr>                      <dbl> <chr>          <chr>   
#>  1 0GC              07                          2022 NA             0GCN22  
#>  2 4GC              03                          2022 NA             4GCH22  
#>  3 4GC              05                          2022 NA             4GCK22  
#>  4 4GC              06                          2022 NA             4GCM22  
#>  5 4GC              08                          2022 NA             4GCQ22  
#>  6 4GC              10                          2022 NA             4GCV22  
#>  7 4GC              12                          2022 NA             4GCZ22  
#>  8 4GC              02                          2023 NA             4GCG23  
#>  9 4GC              04                          2023 NA             4GCJ23  
#> 10 4GC              06                          2023 NA             4GCM23  
#> # ... with 517 more rows, and 15 more variables: PRODUCT DESCRIPTION <chr>, OPEN <dbl>,
#> #   HIGH <dbl>, HIGH AB INDICATOR <chr>, LOW <dbl>, LOW AB INDICATOR <chr>, LAST <dbl>,
#> #   LAST AB INDICATOR <chr>, SETTLE <dbl>, PT CHG <chr>, EST. VOL <dbl>,
#> #   PRIOR SETTLE <dbl>, PRIOR VOL <dbl>, PRIOR INT <dbl>, TRADEDATE <chr>
  • Related