I have a folder with multiple *.rar and *.zip files. Each *.rar and *.zip files have one folder and inside this folder have multiples folders.
I would like to generate a dataset with the names of these multiple folders.
How can I do this using R?
I trying:
temp <- list.files(pattern = "\\.zip$")
lapply(temp, function(x) unzip(x, list = T))
But it returns:
I would like to get just the names: "Nova pasta1" and Nova pasta2"
Thanks
CodePudding user response:
Let's create an simple set of directories/files that are representative of your own. You described having a single .zip file that contains multiple zipped directories, which may contain unzipped files and/or sub-directoris.
# Example main directory
dir.create("main_dir")
# Example directory with 1 file and a subdirectory with 1 file
dir.create("main_dir/example_dir1")
write.csv(data.frame(x = 5), file = "main_dir/example_dir1/example_file.csv")
dir.create("main_dir/example_dir1/example_subdir")
write.csv(data.frame(x = 5), file = "main_dir/example_dir1/example_subdir/example_subdirfile.csv")
# Example directory with 1 file
dir.create("main_dir/example_dir2")
write.csv(data.frame(x = "foo"), file = "main_dir/example_dir2/example_file2.csv")
# NOTE: I was having issues with using `zip()` to zip each directory
# then the main (top) directory, so I manually zipped them below.
# Manually zip example_dir1 and example_dir2, then zip main_dir at this point.
Given this structure, we can get the paths to all of the directories within the highest level directory (main_dir
) using unzip(list = TRUE)
since we know the name of the single zipped directory containing all of these additional zipped sub-directories.
# Unzip the highest level directory available, get all of the .zip dirs within
ex_path <- "main_dir"
all_zips <- unzip(zipfile = paste0(ex_path, ".zip"), list = TRUE)
all_zips
# We can remove the main_path string if we want so that we only
# the zip files within our main directory instead of the full path.
library(dplyr)
all_zips %>%
filter(Name != paste0(ex_path, "/")) %>%
mutate(Name = sub(paste0(ex_path, "/"), "", Name))
If you had multiple zipped directories with nested directories similar to main_dir
, you could just put their paths in a list and apply the function to each element of the list. Below I reproduce this.
# Example of multiple zip directory paths in a list
ziplist <- list(ex_path, ex_path, ex_path)
lapply(ziplist, function(x) {
temp <- unzip(zipfile = paste0(x, ".zip"), list = TRUE)
temp <- temp %>% mutate(main_path = x)
temp <- temp %>%
filter(Name != paste0(ex_path, "/")) %>%
mutate(Name = sub(paste0(ex_path, "/"), "", Name))
temp
})
If all of the .zip files in the current working directory are files you want to do this for, you can get ziplist
above via:
list.files(pattern = ".zip") %>% as.list()
CodePudding user response:
I appreciate all help, but I think that I found a short way to solve my question.
temp.zip <- list.files(pattern = ".zip")
temp.rar <- list.files(pattern = ".rar")
mydata <- lapply(c(temp.rar, temp.zip),
function(x) unique(c(na.omit(str_extract(unlist
(untar(tarfile = x,
list = TRUE)),
'(?<=/).*(?=/)')))))
unlist(mydata)
Thanks all