I am trying to load in a series of CSV files and then append a suffix to each column in the CSV (except for the primary key (subject_id). Each csv looks something like this
subject_id | var1 | var2 |
---|---|---|
1 | 55 | 57 |
2 | 55 | 57 |
Imagine this csv file was titled data1 and subsequent files are titled data2, data3... etc.
For each csv that I load, I would like to to convert the table into something like
subject_id | var1_data1 | var2_data1 |
---|---|---|
1 | 55 | 57 |
2 | 55 | 57 |
subject_id | var1_data2 | var2_data2 |
---|---|---|
1 | 55 | 57 |
2 | 55 | 57 |
I know how to load in the datasets;
filenames <- list.files(path= "data", full.names = TRUE)
datasets <- lapply(filenames, read_csv)
but I am struggling with figuring out how to write a loop/lapply statement to add the suffixes in the way I want.
The function below, will add a suffix but it is static.
lapply(datasets, function(df) {
names(df)[-1] <- paste0(names(df)[-1], "_data1")
df
})
The next thing I tried was to sandwhich a for loop in the middle of the function above
filenames2 <- sub('\\.csv$', '', list.files(path = "data"))
lapply(dataset3, function(df) {
for (val in filenames2){
names(df)[-1] <- paste0(names(df)[-1], val)
df
}
})
But this just changes everything to NULL/doesn't work. Does anyone have a thought on what might be the best way to proceed? I am also open to solutions in python, though R would be preferred.
Thank you!
CodePudding user response:
If every dataset has the same columns, another approach would be to make a single data.frame, where a column would be the dataset origin, here a way to do that.
datasets <-
purrr::map_df(
.x = filenames,
.f = read_csv,
.id = "dataset"
)
CodePudding user response:
Suppose we have the files generated reproducibly in the Note at the end.
Then we get the file names in fnames
and Map
a function Read
over them to read in each file and fix the names returning the fixed up data frame.
fnames <- Sys.glob("data*.csv")
Read <- function(f) {
df <- read.csv(f)
names(df)[-1] <- paste0(names(df[-1]), "_", sub(".csv$", "", basename(f)))
df
}
L <- Map(Read, fnames)
str(L)
giving this named list:
List of 3
$ data1.csv:'data.frame': 2 obs. of 3 variables:
..$ subject_id: int [1:2] 1 2
..$ var1_data1: int [1:2] 55 55
..$ var2_data1: int [1:2] 57 57
$ data2.csv:'data.frame': 2 obs. of 3 variables:
..$ subject_id: int [1:2] 1 2
..$ var1_data2: int [1:2] 55 55
..$ var2_data2: int [1:2] 57 57
$ data3.csv:'data.frame': 2 obs. of 3 variables:
..$ subject_id: int [1:2] 1 2
..$ var1_data3: int [1:2] 55 55
..$ var2_data3: int [1:2] 57 57
Note
Lines <- "subject_id var1 var2
1 55 57
2 55 57"
data1 <- data2 <- data3 <- read.table(text = Lines, header = TRUE)
for(f in c("data1", "data2", "data3")) write.csv(get(f), paste0(f, ".csv"), row.names = FALSE, quote = FALSE)