I have scrapped quite a few url addresses to download files from internet. Most of them work fine, for example:
url1 <- "http://www.catastro.minhap.es/INSPIRE/CadastralParcels/02/02006-ALCADOZO/A.ES.SDGC.CP.02006.zip"
download.file(url1, destfile = "A.ES.SDGC.CP.02006.zip", quiet = TRUE)
Works fine, but
url2 <- ""http://www.catastro.minhap.es/INSPIRE/CadastralParcels/02/02007-ALCALA DEL JUCAR/A.ES.SDGC.CP.02007.zip""
download.file(url2, destfile = "A.ES.SDGC.CP.02007.zip", quiet = TRUE)
fails
in download.file(municipio, destfile = filename, quiet = TRUE) :
cannot open URL 'http://www.catastro.minhap.es/INSPIRE/CadastralParcels/02/02007-ALCALA DEL JUCAR/A.ES.SDGC.CP.02007.zip'
In addition: Warning message:
In download.file(municipio, destfile = filename, quiet = TRUE) :
URL 'http://www.catastro.minhap.es/INSPIRE/CadastralParcels/02/02007-ALCALA DEL JUCAR/A.ES.SDGC.CP.02007.zip': status was 'URL using bad/illegal format or missing URL'
I know the problem is with the white spaces and the encoding (same happens with other characters, like Ñ).
But I have been unable to solve it forcing a windows encoding, "windows-1252"
, in the url address.
curl::curl_download
doesn`t solve the problem.
Curiously, if I Copy & Paste the url in the brownser, everything works fine, and I can download the file.
Any help would be appreciated.
> sessionInfo()
R version 4.2.1 (2022-06-23 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale:
[1] LC_COLLATE=Spanish_Spain.utf8 LC_CTYPE=Spanish_Spain.utf8 LC_MONETARY=Spanish_Spain.utf8 LC_NUMERIC=C
[5] LC_TIME=Spanish_Spain.utf8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] rvest_1.0.3 forcats_0.5.2 stringr_1.4.1 dplyr_1.0.10 purrr_0.3.4 readr_2.1.2 tidyr_1.2.0
[8] tibble_3.1.8 ggplot2_3.3.6 tidyverse_1.3.2
loaded via a namespace (and not attached):
[1] pillar_1.8.1 compiler_4.2.1 cellranger_1.1.0 dbplyr_2.2.1 tools_4.2.1 lubridate_1.8.0
[7] jsonlite_1.8.0 googledrive_2.0.0 lifecycle_1.0.1 gargle_1.2.0 gtable_0.3.1 pkgconfig_2.0.3
[13] rlang_1.0.5 reprex_2.0.2 DBI_1.1.3 cli_3.3.0 rstudioapi_0.14 curl_4.3.2
[19] haven_2.5.1 xml2_1.3.3 withr_2.5.0 httr_1.4.4 hms_1.1.2 generics_0.1.3
[25] vctrs_0.4.1 fs_1.5.2 tictoc_1.0.1 googlesheets4_1.0.1 grid_4.2.1 tidyselect_1.1.2
[31] glue_1.6.2 R6_2.5.1 fansi_1.0.3 readxl_1.4.1 selectr_0.4-2 tzdb_0.3.0
[37] modelr_0.1.9 magrittr_2.0.3 ellipsis_0.3.2 backports_1.4.1 scales_1.2.1 assertthat_0.2.1
[43] colorspace_2.0-3 utf8_1.2.2 stringi_1.7.8 munsell_0.5.0 broom_1.0.1 crayon_1.5.1
Windows encoding:
[System.Text.Encoding]::Default
IsSingleByte : True
BodyName : iso-8859-1
EncodingName : Europeo occidental (Windows)
HeaderName : Windows-1252
WebName : Windows-1252
WindowsCodePage : 1252
IsBrowserDisplay : True
IsBrowserSave : True
IsMailNewsDisplay : True
IsMailNewsSave : True
EncoderFallback : System.Text.InternalEncoderBestFitFallback
DecoderFallback : System.Text.InternalDecoderBestFitFallback
IsReadOnly : True
CodePage : 1252
CodePudding user response:
Your url2
string contains spaces, should be percent-encoded (read Details in download.file
):
url2 <- "http://www.catastro.minhap.es/INSPIRE/CadastralParcels/02/02007-ALCALA DEL JUCAR/A.ES.SDGC.CP.02007.zip"
download.file(URLencode(url2), destfile = "A.ES.SDGC.CP.02007.zip", quiet = TRUE)
CodePudding user response:
Given the binary format of .zip files, consider the mode="wb"
argument of download.file
:
url2 <- paste0(
"http://www.catastro.minhap.es/",
"INSPIRE/CadastralParcels/02/",
"02007-ALCALA DEL JUCAR/A.ES.SDGC.CP.02007.zip"
)
download.file(
url2,
destfile = "A.ES.SDGC.CP.02007.zip",
mode = "wb"
quiet = TRUE
)