Home > Software design >  filter filenames in R according to pattern
filter filenames in R according to pattern

Time:01-10

I have a list of files in a directory with the format

Firstname-Lastname-DateofBirth-ID.pdf

for example Tom-Hanks-01.01.1960-5555.pdf The ID is up to now a 4 digit number but it can be a 5 or 6 digit in the future.

i have a vector of IDs, whose corresponding pdf files I would like to move to another directory

I would like R to read the list of files in directory , identify from the filename the ID, check if the ID is in the vectors of IDs to move and if yes move the file.

I can do the latter part with an if() , %in% and system(). How can do the filtering?

tidyverse solution preferred.

Excuse me if I cannot provide a reproducible example. However, just the reading of the filenames and extraction of the ID from the filename alone would help.

CodePudding user response:

Suppose we have the following text files in the current working directory.

$ ls *.txt

Tom-Hanks-01.01.1960-11.txt  
Tom-Hanks-01.01.1960-88.txt    
Tom-Hanks-01.01.1960-4444.txt
Tom-Hanks-01.01.1960-5555.txt
Tom-Hanks-01.01.1960-123456.txt  

And I want to move files with ID 111, 222, 5555, 88 and 123456 to a folder in the current wd named dest. So we can do the following,

library(fs)
library(stringr)

# grab the text files
files <- fs::dir_ls(".", glob = "*.txt")
id_to_move <- c(111, 222, 5555, 88, 123456)

# match the pattern which files we need to move.
files_to_move <- files[str_extract(files, "(?<=-)\\w (?=.txt)") %in% id_to_move]

# move the file.
fs::file_move(files_to_move, "dest")
$ ls dest

Tom-Hanks-01.01.1960-123456.txt  
Tom-Hanks-01.01.1960-5555.txt  
Tom-Hanks-01.01.1960-88.txt

CodePudding user response:

with base R:

    filelist <- list.files(pattern = "*.csv")
    
    filelist
    [[1]]
    [1] "Tom-Hanks-01.01.1960-11.pdf"
    
    [[2]]
    [1] "Tom-Hanks-01.01.1960-88.pdf"
    
    [[3]]
    [1] "Tom-Hanks-01.01.1960-4444.pdf"
    
    [[4]]
    [1] "Tom-Hanks-01.01.1960-5555.pdf"
    
    [[5]]
    [1] "Tom-Hanks-01.01.1960-123456.pdf"

idsToMove <- c(1111, 2222, 3333, 4444, 5555)

toMove <- filelist[sapply(filelist, FUN = \(x) strsplit(x=x, split = "-|.pdf", perl=TRUE)[[1]][4] %in% idsToMove)]

> toMove
[[1]]
[1] "Tom-Hanks-01.01.1960-4444.pdf"

[[2]]
[1] "Tom-Hanks-01.01.1960-5555.pdf"
  • Related