My data is currently in the form of multiple image files (png), all within a single folder. There are a range of associated metadata contained in the file name. For example, the format of each file name is something like 'patientId_dateOfBirth_sex_modality_date_time.png'. I would like to analyze this data by importing it into a tibble in R and then manipulating it to make it tidy.
I have come across a few suggestions which use the Command Prompt but I was hoping for a solution using R script to make it reproducible. I think that if I can export the file names into a tibble then I should be able to figure out how to make it tidy using stringr, but I'm not sure what sort of import options are available.
Thank you!
CodePudding user response:
Some actual example filenames would have been helpful to understand the problem better.
However, you may use list.files
to list all the files in a particular directory and since all the values are separated by _
you may use separate
to split the values in different columns.
library(dplyr)
library(tidyr)
filenames <- list.files('/path/to/png/files')
#Used these two filenames as example
#filenames <- c("A123_19910622_M_2_20230114_042312.png",
# "A128_19910828_F_4_20221214_142110.png")
tibble(filenames) %>%
separate(filenames, c('patientId', 'dateOfBirth', 'sex', 'modality',
'date', 'time'), extra = "drop")
# A tibble: 2 × 6
# patientId dateOfBirth sex modality date time
# <chr> <chr> <chr> <chr> <chr> <chr>
#1 A123 19910622 M 2 20230114 042312
#2 A128 19910828 F 4 20221214 142110