Home > Back-end >  Read nested folder and file name, export to Excel file
Read nested folder and file name, export to Excel file

Time:10-29

So I am tasked with building an excel spreadsheet cataloging a drive with various nested folders and files. This enter image description here

I know that there might be a command to get file info and I can break that into these columns.

CodePudding user response:

Apart from the directories split into subdirs, the adaptation of the function in the question's link, Stibu's answer, might be of help.

rfl <- function(path) {
  folders <- list.dirs(path, recursive = FALSE, full.names = FALSE)
  if (length(folders)==0) {
    files <- list.files(path, full.names = TRUE)
    finfo <- file.info(files)
    Filename <- basename(files)
    FileType <- tools::file_ext(files)
    DateModified <- finfo$mtime
    FullFilePath <- dirname(files)
    size <- finfo$size
    data.frame(Filename, FileType, DateModified, FullFilePath, size)
  } else {
    sublist <- lapply(paste0(path,"/",folders),rfl)
    setNames(sublist,folders)
  }  
}

CodePudding user response:

If you have the full path and file names then you can loop through that and parse it into these columns. You can get more file info with file.info:

files <- c("I:/Administration/Budget/2015-BUDGET DOCUMENT.xlsx",
           "I:/Administration/Budget/2014-2015 Budget/BUDGET DOCUMENT.xlsx")

# files <- list.files("I:", recursive = T, full.names = T) # this could take a while to run

file_info <- list(length = length(files))

for (i in seq_along(files)){
  fullpath <- dirname(files[i])
  fullname <- basename(files[i])
  file_ext <- unlist(strsplit(fullname, ".", fixed = T))
  file_meta <- file.info(files[i])[c("size", "mtime")]
  path <- unlist(strsplit(fullpath, "/", fixed = T))[-1]
  file_info[[i]] <- unlist(c(file_ext, file_meta, fullpath, path))
}

l <- lapply(file_info, `length<-`, max(lengths(file_info)))
df <- data.frame(do.call(rbind, l))
names(df) <- c("filename", "extension", "size", "modified", paste0("sub", 1:(ncol(df) - 4)))

rownames(df) <- NULL
df$modified <- as.POSIXct.numeric(as.numeric(df$modified), origin = "1970-01-01")
df$size <- as.numeric(df$size)

If you do not have the files you can recursively search the drive using list.files() with recursive = T: list.files("I:", recursive = T, full.names = T)

Note:

  1. l <- lapply(file_info, `length<-`, max(lengths(file_info))) sets the vector length of each list element to be the same. This is necessary because otherwise when the vectors are stacked with unequal lengths values get recycled. A simple example of this is: rbind(1:3, 1:5)
  2. The output of unlist(c(file_ext, file_meta, fullpath, path)) is a vector and vectors in R are atomic, meaning all elements have to be the same class. That means everything gets converted to character in this case, which is why we have the lines df$modified <- ... and df$size <- ... at the end to convert them to their appropriate type.
  3. If you want to output this data frame to excel check out xlsx::write.xlsx or openxlsx::write.xlsx. If you don't have those libraries installed you'll need to use install.packages() first.

Output

Because these files/locations don't actually exist on my computer there are NA values in the size and date modified fields:

              filename extension size modified                                      sub1           sub2   sub3             sub4
1 2015-BUDGET DOCUMENT      xlsx   NA     <NA>                  I:/Administration/Budget Administration Budget             <NA>
2      BUDGET DOCUMENT      xlsx   NA     <NA> I:/Administration/Budget/2014-2015 Budget Administration Budget 2014-2015 Budget
  •  Tags:  
  • r
  • Related