I am going to download the file starting with "s_" from a zip target in url "https://genelab-data.ndc.nasa.gov/geode-py/ws/studies/GLDS-87/download?source=datamanager&file=GLDS-87_metadata_Zanello_STS135-ISA.zip". Later, I need to download many datasets like this but the only common thing among them is that the target file starts with "s_". For now I only have the below code:
temp <- tempfile()
download.file("https://genelab-data.ndc.nasa.gov/geode-py/ws/studies/GLDS-87/download?source=datamanager&file=GLDS-87_metadata_Zanello_STS135-ISA.zip", temp)
Can you guide me on how to complete my code to get only file starting with "s_"?
CodePudding user response:
Here is a way.
- Get the names of the files in the .zip file;
- Search the target filename with
grep
; - Extract the target file.
Get the zip filename.
fl <- list.files(pattern = "GLDS")
fl
#[1] "GLDS-87_metadata_Zanello_STS135-ISA.zip"
Now extract the target file following the steps above.
files_list <- unzip(fl, list = TRUE)
str(files_list)
#'data.frame': 3 obs. of 3 variables:
# $ Name : chr "s_Zanello.txt" "a_zanello_transcription_profiling_DNA_microarray.txt" "i_Investigation.txt"
# $ Length: num 7138 6535 11233
# $ Date : POSIXct, format: "2020-09-25 11:25:00" "2020-09-25 11:25:00" ...
i <- grep("^s_", files_list$Name)
unzip(fl, files = files_list$Name[i])
list.files(pattern = "^s_")
#[1] "s_Zanello.txt"