I have multiple csv files from an experiment I am running. Each participant goes through 4 conditions. For each condition, I have a file; each file has 20 rows of data (one row for each trial in the condition.) The files' names look like this:
2022-04-22_exp-04_p-01_1_cond-A-h-f
2022-04-22_exp-04_p-01_2_cond-A-h-n
2022-04-22_exp-04_p-01_3_cond-B-e-f
2022-04-22_exp-04_p-01_4_cond-B-e-n
With the date, the experiment number, the participant's ID, the sequence number and the condition. The condition is determined by three factors: whether the participant saw scene A or scene B; whether the task was easy (e) or hard (h); whether the participant received feedback (f) or not (n).
I would like to have a R script that goes to each file, opens it and adds four columns: one column with the participant's ID, one column with 1s for scene A and 0s otherwise, one column with 1s for a "hard task" (h) and 0s otherwise, and one column with 1s for feedback (f) and 0s otherwise.
I have been using R for a few years but I am not very practical with using scripts to loop through files in a folder and manage file' names.
I guess that the script should go to a file, copy the file's name, strip the string of hyphens and underscores, then add the columns conditional on the string's content. This last part, adding columns depending on the string's content, feels the most challenging to me. What's the best way to address it?
CodePudding user response:
You could do something like this, assuming that the filenames have the exact structure you indicate, and are comma-separated files, and you have saved them in a vector called fnames
# load tidyverse
library(tidyverse)
# make function that extracts the information you need, using regular expressions
get_vals_from_fname <- function(n) {
list(id=str_extract(n,"p-\\d "),
cond=str_extract(n,"(?<=cond-). ")
)
}
# use lapply to read each file, and add the columns of interest
lapply(fnames, function(f) {
vals=get_vals_from_fname(f)
read_csv(f) %>%
mutate(id=vals[["id"]],
sceneA = vals[["cond"]][1]=="A",
hard = as.numeric(vals[["cond"]][3]=="h"),
feedback=as.numeric(vals[["cond"]][5]=="f")
)
})