Home > Mobile >  How to export folder of file names into a tibble/data frame using R
How to export folder of file names into a tibble/data frame using R

Time:01-14

My data is currently in the form of multiple image files (png), all within a single folder. There are a range of associated metadata contained in the file name. For example, the format of each file name is something like 'patientId_dateOfBirth_sex_modality_date_time.png'. I would like to analyze this data by importing it into a tibble in R and then manipulating it to make it tidy.

I have come across a few suggestions which use the Command Prompt but I was hoping for a solution using R script to make it reproducible. I think that if I can export the file names into a tibble then I should be able to figure out how to make it tidy using stringr, but I'm not sure what sort of import options are available.

Thank you!

CodePudding user response:

Some actual example filenames would have been helpful to understand the problem better.

However, you may use list.files to list all the files in a particular directory and since all the values are separated by _ you may use separate to split the values in different columns.

library(dplyr)
library(tidyr)

filenames <- list.files('/path/to/png/files')

#Used these two filenames as example
#filenames <- c("A123_19910622_M_2_20230114_042312.png", 
#               "A128_19910828_F_4_20221214_142110.png")

tibble(filenames) %>%
  separate(filenames, c('patientId', 'dateOfBirth', 'sex', 'modality', 
                        'date', 'time'), extra = "drop")

# A tibble: 2 × 6
#  patientId dateOfBirth sex   modality date     time  
#  <chr>     <chr>       <chr> <chr>    <chr>    <chr> 
#1 A123      19910622    M     2        20230114 042312
#2 A128      19910828    F     4        20221214 142110
  • Related