Home > database >  Downloading and unzipping GitHub zipped files directly in R
Downloading and unzipping GitHub zipped files directly in R

Time:09-23

I am trying to download and unzip a folder of files from GitHub into R. I can manually download the file at https://github.com/dylangomes/SO/blob/main/Shape.zip and then extract all files in working directory, but I'd like to work directly from R.

utils::unzip("https://github.com/dylangomes/SO/blob/main/Shape.zip")

# Warning message:
# In utils::unzip("https://github.com/dylangomes/SO/blob/main/Shape.zip",  :
#  error 1 in extracting from zip file    

It says it is a warning message, although nothing has been downloaded or unzipped into my wd.

I can download the file to my machine:

utils::download.file("https://github.com/dylangomes/SO/blob/main/Shape.zip")

But I get the same message with the unzip function:

utils::unzip("Shape.zip")

And the downloaded file cannot manually be extracted. Here, I get the error that the compressed folder is empty. The unzip line works on the manually downloaded .zip file, which tells me something is wrong with the download.file line.

So if I add raw=TRUE to the end (which can make a difference in downloading data from GitHub):

utils::download.file("https://github.com/dylangomes/SO/blob/main/Shape.zip?raw=TRUE","Shape.zip")
utils::unzip("Shape.zip")

I get a different warning with, similarly, nothing being executed:

Warning message:
In utils::unzip("Shape.zip") : internal error in 'unz' code

I have tried most of the answers at Using R to download zipped data file, extract, and import data, but they appear to be for single files that are zipped and aren't helping here. I've tried the answers at r function unzip error 1 in extracting from zip file, which mentions the same warning message I am getting, but none of the solutions work in this case.

Any idea of what I am doing wrong?

CodePudding user response:

You need to use:

download.file(
  "https://github.com/dylangomes/SO/blob/main/Shape.zip?raw=TRUE",
  "Shape.zip",
  mode = "wb"
)

Without the query string ?raw=TRUE you are downloading the webpage and not the file.

(For Windows) R will use mode = "wb" by default when it detects from the end of the URL that certain file formats, including .zip, are being downloaded. However, the URL finishing with a query string instead of a file format means the check fails so you need to set the mode explicitly.

  • Related