Home > Enterprise >  Can a column of data from several .csv files (stored in the same folder) be compiled into a single c
Can a column of data from several .csv files (stored in the same folder) be compiled into a single c

Time:10-05

I would like to take some of the columns from several (lots!) of .csv files and build a new .csv file with all of the columns stacked. All the .csv files are in one folder and are almost identical except for one column of data. Most of the columns need to stay exactly the same but one new column needs to be generated from the name in the header of one of the columns (this is the only header that will be changing) and I would like to drop some columns of unneeded data. Here is an example of what I would like to have happen but simplified to only two data sets:

Data set 1
   X        date distance.code location.code Time1     unneeded.data
   1 2021/05/18             C          BSF1 0.715               999
   2 2021/05/18            30          BSF1 0.896               999
   3 2021/05/18            60          BSF1 0.803               999
   4 2021/05/18           120          BSF1 0.705               999
   5 2021/05/18             C          BSG1 0.563               999
   6 2021/05/18             C          BSG1 0.961               999
   7 2021/05/18            30          BSG1 0.833               999
   8 2021/05/18            60          BSG1 0.894               999
 
data set 2                                                                        
  X       date distance.code location.code  Time2     unneeded data
  1 2021/05/18             C          BSF1 0.806               999
  2 2021/05/18            30          BSF1 0.878               999
  3 2021/05/18            60          BSF1 0.625               999
  4 2021/05/18           120          BSF1 0.609               999
  5 2021/05/18             C          BSG1 0.766               999
  6 2021/05/18             C          BSG1 0.846               999
  7 2021/05/18            30          BSG1 0.856               999
  8 2021/05/18            60          BSG1 0.698               999

Combined datasets (this is what I want)                                                                           
  X       date distance.code location.code        timeperiod   Values
  1 2021/05/18             C          BSF1             Time1 0.715
  2 2021/05/18            30          BSF1             Time1 0.896
  3 2021/05/18            60          BSF1             Time1 0.803
  4 2021/05/18           120          BSF1             Time1 0.705
  5 2021/05/18             C          BSG1             Time1 0.563
  6 2021/05/18             C          BSG1             Time1 0.961
  7 2021/05/18            30          BSG1             Time1 0.833
  8 2021/05/18            60          BSG1             Time1 0.894
  1 2021/05/18             C          BSF1             Time2 0.806
  2 2021/05/18            30          BSF1             Time2 0.878
  3 2021/05/18            60          BSF1             Time2 0.625
  4 2021/05/18           120          BSF1             Time2 0.609
  5 2021/05/18             C          BSG1             Time2 0.766
  6 2021/05/18             C          BSG1             Time2 0.846
  7 2021/05/18            30          BSG1             Time2 0.856
  8 2021/05/18            60          BSG1             Time2 0.698 

As you can see the unique header for the Time columns was shifted to a new column. I have been trying to do this or some part of this process by creating a list from all the files in the folder and then trying to bind them with rbind.

temp <- list.files(pattern="*.csv")
myfiles = lapply(temp,read.csv)

df <- do.call(rbind, myfiles)

This didn't work but it seems like it should.Also, this wouldn't even do everything I want it too but I wasn't sure how to get the new column with the unique time codes. I think this may be the messy way to do this and I expect there is a more elegant solution. Thanks for any help.

CodePudding user response:

I think you should use the bind_rows from the dplyr package:

library(dplyr)
df <- bind_rows(myfiles)

CodePudding user response:

You can select the required columns in the final data and get data in long format using pivot_longer. Using map_df you can combine the list in one single dataframe.

library(dplyr)
library(purrr)

temp <- list.files(pattern="*.csv")
myfiles = lapply(temp,read.csv)

cols_to_keep <- c('X', 'date','distance.code', 'location.code')

combine_data <- map_df(myfiles, ~.x %>% 
      select(all_of(cols_to_keep), starts_with('Time')) %>%
      pivot_longer(cols = starts_with('Time'), 
                   names_to = 'timeperiod', 
                   values_to = 'Values'))
  • Related