download and save a specific file from a zip folder in r-CodePudding

I am going to download the file starting with "s_" from a zip target in url "https://genelab-data.ndc.nasa.gov/geode-py/ws/studies/GLDS-87/download?source=datamanager&file=GLDS-87_metadata_Zanello_STS135-ISA.zip". Later, I need to download many datasets like this but the only common thing among them is that the target file starts with "s_". For now I only have the below code:

temp <- tempfile()
download.file("https://genelab-data.ndc.nasa.gov/geode-py/ws/studies/GLDS-87/download?source=datamanager&file=GLDS-87_metadata_Zanello_STS135-ISA.zip", temp)

Can you guide me on how to complete my code to get only file starting with "s_"?

CodePudding user response：

Here is a way.

Get the names of the files in the .zip file;
Search the target filename with grep;
Extract the target file.

Get the zip filename.

fl <- list.files(pattern = "GLDS")
fl
#[1] "GLDS-87_metadata_Zanello_STS135-ISA.zip"

Now extract the target file following the steps above.

files_list <- unzip(fl, list = TRUE)
str(files_list)
#'data.frame':  3 obs. of  3 variables:
# $ Name  : chr  "s_Zanello.txt" "a_zanello_transcription_profiling_DNA_microarray.txt" "i_Investigation.txt"
# $ Length: num  7138 6535 11233
# $ Date  : POSIXct, format: "2020-09-25 11:25:00" "2020-09-25 11:25:00" ...

i <- grep("^s_", files_list$Name)
unzip(fl, files = files_list$Name[i])

list.files(pattern = "^s_")
#[1] "s_Zanello.txt"