I would like to take some of the columns from several (lots!) of .csv files and build a new .csv file with all of the columns stacked. All the .csv files are in one folder and are almost identical except for one column of data. Most of the columns need to stay exactly the same but one new column needs to be generated from the name in the header of one of the columns (this is the only header that will be changing) and I would like to drop some columns of unneeded data. Here is an example of what I would like to have happen but simplified to only two data sets:
Data set 1
X date distance.code location.code Time1 unneeded.data
1 2021/05/18 C BSF1 0.715 999
2 2021/05/18 30 BSF1 0.896 999
3 2021/05/18 60 BSF1 0.803 999
4 2021/05/18 120 BSF1 0.705 999
5 2021/05/18 C BSG1 0.563 999
6 2021/05/18 C BSG1 0.961 999
7 2021/05/18 30 BSG1 0.833 999
8 2021/05/18 60 BSG1 0.894 999
data set 2
X date distance.code location.code Time2 unneeded data
1 2021/05/18 C BSF1 0.806 999
2 2021/05/18 30 BSF1 0.878 999
3 2021/05/18 60 BSF1 0.625 999
4 2021/05/18 120 BSF1 0.609 999
5 2021/05/18 C BSG1 0.766 999
6 2021/05/18 C BSG1 0.846 999
7 2021/05/18 30 BSG1 0.856 999
8 2021/05/18 60 BSG1 0.698 999
Combined datasets (this is what I want)
X date distance.code location.code timeperiod Values
1 2021/05/18 C BSF1 Time1 0.715
2 2021/05/18 30 BSF1 Time1 0.896
3 2021/05/18 60 BSF1 Time1 0.803
4 2021/05/18 120 BSF1 Time1 0.705
5 2021/05/18 C BSG1 Time1 0.563
6 2021/05/18 C BSG1 Time1 0.961
7 2021/05/18 30 BSG1 Time1 0.833
8 2021/05/18 60 BSG1 Time1 0.894
1 2021/05/18 C BSF1 Time2 0.806
2 2021/05/18 30 BSF1 Time2 0.878
3 2021/05/18 60 BSF1 Time2 0.625
4 2021/05/18 120 BSF1 Time2 0.609
5 2021/05/18 C BSG1 Time2 0.766
6 2021/05/18 C BSG1 Time2 0.846
7 2021/05/18 30 BSG1 Time2 0.856
8 2021/05/18 60 BSG1 Time2 0.698
As you can see the unique header for the Time columns was shifted to a new column. I have been trying to do this or some part of this process by creating a list from all the files in the folder and then trying to bind them with rbind.
temp <- list.files(pattern="*.csv")
myfiles = lapply(temp,read.csv)
df <- do.call(rbind, myfiles)
This didn't work but it seems like it should.Also, this wouldn't even do everything I want it too but I wasn't sure how to get the new column with the unique time codes. I think this may be the messy way to do this and I expect there is a more elegant solution. Thanks for any help.
CodePudding user response:
I think you should use the bind_rows
from the dplyr
package:
library(dplyr)
df <- bind_rows(myfiles)
CodePudding user response:
You can select
the required columns in the final data and get data in long format using pivot_longer
. Using map_df
you can combine the list in one single dataframe.
library(dplyr)
library(purrr)
temp <- list.files(pattern="*.csv")
myfiles = lapply(temp,read.csv)
cols_to_keep <- c('X', 'date','distance.code', 'location.code')
combine_data <- map_df(myfiles, ~.x %>%
select(all_of(cols_to_keep), starts_with('Time')) %>%
pivot_longer(cols = starts_with('Time'),
names_to = 'timeperiod',
values_to = 'Values'))