Home > Blockchain >  Extract folder names inside of *.rar and *.zip files
Extract folder names inside of *.rar and *.zip files

Time:08-04

I have a folder with multiple *.rar and *.zip files. Each *.rar and *.zip files have one folder and inside this folder have multiples folders.

I would like to generate a dataset with the names of these multiple folders.

How can I do this using R?

I trying:

temp <- list.files(pattern = "\\.zip$")
lapply(temp, function(x) unzip(x, list = T))

But it returns:

enter image description here

I would like to get just the names: "Nova pasta1" and Nova pasta2"

Thanks

CodePudding user response:

Let's create an simple set of directories/files that are representative of your own. You described having a single .zip file that contains multiple zipped directories, which may contain unzipped files and/or sub-directoris.

# Example main directory
dir.create("main_dir")

# Example directory with 1 file and a subdirectory with 1 file
dir.create("main_dir/example_dir1")
write.csv(data.frame(x = 5), file = "main_dir/example_dir1/example_file.csv")
dir.create("main_dir/example_dir1/example_subdir")
write.csv(data.frame(x = 5), file = "main_dir/example_dir1/example_subdir/example_subdirfile.csv")

# Example directory with 1 file
dir.create("main_dir/example_dir2")
write.csv(data.frame(x = "foo"), file = "main_dir/example_dir2/example_file2.csv")

# NOTE: I was having issues with using `zip()` to zip each directory
# then the main (top) directory, so I manually zipped them below.

# Manually zip example_dir1 and example_dir2, then zip main_dir at this point.

Given this structure, we can get the paths to all of the directories within the highest level directory (main_dir) using unzip(list = TRUE) since we know the name of the single zipped directory containing all of these additional zipped sub-directories.

# Unzip the highest level directory available, get all of the .zip dirs within
ex_path <- "main_dir"
all_zips <- unzip(zipfile = paste0(ex_path, ".zip"), list = TRUE)
all_zips

# We can remove the main_path string if we want so that we only
# the zip files within our main directory instead of the full path.
library(dplyr)

all_zips %>%
  filter(Name != paste0(ex_path, "/")) %>%
  mutate(Name = sub(paste0(ex_path, "/"), "", Name))

If you had multiple zipped directories with nested directories similar to main_dir, you could just put their paths in a list and apply the function to each element of the list. Below I reproduce this.

# Example of multiple zip directory paths in a list
ziplist  <- list(ex_path, ex_path, ex_path)

lapply(ziplist, function(x) {
  temp <- unzip(zipfile = paste0(x, ".zip"), list = TRUE)
  temp <- temp %>% mutate(main_path = x)
  temp <- temp %>% 
           filter(Name != paste0(ex_path, "/")) %>%
           mutate(Name = sub(paste0(ex_path, "/"), "", Name))
  temp
})

If all of the .zip files in the current working directory are files you want to do this for, you can get ziplist above via:

list.files(pattern = ".zip") %>% as.list()

CodePudding user response:

I appreciate all help, but I think that I found a short way to solve my question.

temp.zip <- list.files(pattern = ".zip")
temp.rar <- list.files(pattern = ".rar")

mydata <- lapply(c(temp.rar, temp.zip),
                 function(x) unique(c(na.omit(str_extract(unlist
                                                          (untar(tarfile = x, 
                                                                 list = TRUE)),
                                                          '(?<=/).*(?=/)')))))

unlist(mydata)

Thanks all

  • Related