I can download in the browser a file from this website https://www.cmegroup.com/ftp/pub/settle/comex_future.csv
However when I try the following
url <- "https://www.cmegroup.com/ftp/pub/settle/comex_future.csv"
dest <- "C:\\COMEXfut.csv"
download.file(url, dest)
I get the following error message
Error in download.file(url, dest) :
cannot open URL 'https://www.cmegroup.com/ftp/pub/settle/comex_future.csv'
In addition: Warning message:
In download.file(url, dest) :
InternetOpenUrl failed: 'The operation timed out'
even if I choose:
options(timeout = max(600, getOption("timeout")))
any idea why is this happening ? thanks !
CodePudding user response:
The problem here is that the site from which you are downloading needs a couple of additional headers. The easiest way to supply them is using the httr
package
library(httr)
url <- "https://www.cmegroup.com/ftp/pub/settle/comex_future.csv"
UA <- paste('Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:98.0)',
'Gecko/20100101 Firefox/98.0')
res <- GET(url, add_headers(`User-Agent` = UA, Connection = 'keep-alive'))
This should download in less than a second.
If you want to save the file you can do
writeBin(res$content, 'myfile.csv')
Or if you just want to read the data straight into R without even saving it, you can do:
content(res)
#> Rows: 527 Columns: 20
#> 0s-- Column specification ----------------------------------------------------------------
#> Delimiter: ","
#> chr (10): PRODUCT SYMBOL, CONTRACT MONTH, CONTRACT DAY, CONTRACT, PRODUCT DESCRIPTIO...
#> dbl (10): CONTRACT YEAR, OPEN, HIGH, LOW, LAST, SETTLE, EST. VOL, PRIOR SETTLE, PRIO...
#>
#> i Use `spec()` to retrieve the full column specification for this data.
#> i Specify the column types or set `show_col_types = FALSE` to quiet this message.
#> # A tibble: 527 x 20
#> `PRODUCT SYMBOL` `CONTRACT MONTH` `CONTRACT YEAR` `CONTRACT DAY` CONTRACT
#> <chr> <chr> <dbl> <chr> <chr>
#> 1 0GC 07 2022 NA 0GCN22
#> 2 4GC 03 2022 NA 4GCH22
#> 3 4GC 05 2022 NA 4GCK22
#> 4 4GC 06 2022 NA 4GCM22
#> 5 4GC 08 2022 NA 4GCQ22
#> 6 4GC 10 2022 NA 4GCV22
#> 7 4GC 12 2022 NA 4GCZ22
#> 8 4GC 02 2023 NA 4GCG23
#> 9 4GC 04 2023 NA 4GCJ23
#> 10 4GC 06 2023 NA 4GCM23
#> # ... with 517 more rows, and 15 more variables: PRODUCT DESCRIPTION <chr>, OPEN <dbl>,
#> # HIGH <dbl>, HIGH AB INDICATOR <chr>, LOW <dbl>, LOW AB INDICATOR <chr>, LAST <dbl>,
#> # LAST AB INDICATOR <chr>, SETTLE <dbl>, PT CHG <chr>, EST. VOL <dbl>,
#> # PRIOR SETTLE <dbl>, PRIOR VOL <dbl>, PRIOR INT <dbl>, TRADEDATE <chr>