Home > OS >  Import multiple CSV files, add same column headers and then cbind R
Import multiple CSV files, add same column headers and then cbind R

Time:12-07

I'm realtively new to R and have been trying to find a working answer here for the last three hours, but just cannot seem to find a combination that works.

I have a folder that contains 841 csv files, none of the files have column names. The format is the same for every file (although some of the files might have blank columns due to there simply not being any data available for said column in that file).

I want to be able to read in all 841 csv files, add the column names and then cbind them into a single data frame.

Bringing in a single file and adding the column names is easy enough:

col.names = c("ID", "NAMES_URI",    "NAME1",    "NAME1_LANG",   "NAME2",    "NAME2_LANG",   "TYPE", "LOCAL_TYPE",
          "GEOMETRY_X", "GEOMETRY_Y", "MOST_DETAIL_VIEW_RES", "LEAST_DETAIL_VIEW_RES",  "MBR_XMIN",
          "MBR_YMIN", "MBR_XMAX", "MBR_YMAX", "POSTCODE_DISTRICT", "POSTCODE_DISTRICT_URI",
          "POPULATED_PLACE", "POPULATED_PLACE_URI", "POPULATED_PLACE_TYPE", "DISTRICT_BOROUGH",
          "DISTRICT_BOROUGH_URI", "DISTRICT_BOROUGH_TYPE", "COUNTY_UNITARY",    "COUNTY_UNITARY_URI",
          "COUNTY_UNITARY_TYPE", "REGION", "REGION_URI", "COUNTRY", "COUNTRY_URI",  "RELATED_SPATIAL_OBJECT",
          "SAME_AS_DBPEDIA", "SAME_AS_GEONAMES")

Single_File <- fread(file = "C:/Users/djr/Desktop/PostCodes/Data/HP40.csv", header = FALSE)

setnames(Single_File, col.names)

My issue comes in when I try to read the files in as a list and bind. I've tried examples using lapply or map_dfr, but they always bring up error messages about the vector size not being the same or not being able to fill or about the column specification not being the same.

My current code I am trying is:

  dir(pattern = ".csv") %>% 


 map_dfr(read_csv, col_names = c("ID", "NAMES_URI",    "NAME1",    "NAME1_LANG",   "NAME2",    "NAME2_LANG",   "TYPE", "LOCAL_TYPE",
                                  "GEOMETRY_X", "GEOMETRY_Y", "MOST_DETAIL_VIEW_RES", "LEAST_DETAIL_VIEW_RES",  "MBR_XMIN",
                                  "MBR_YMIN", "MBR_XMAX", "MBR_YMAX", "POSTCODE_DISTRICT", "POSTCODE_DISTRICT_URI",
                                  "POPULATED_PLACE", "POPULATED_PLACE_URI", "POPULATED_PLACE_TYPE", "DISTRICT_BOROUGH",
                                  "DISTRICT_BOROUGH_URI", "DISTRICT_BOROUGH_TYPE", "COUNTY_UNITARY",    "COUNTY_UNITARY_URI",
                                  "COUNTY_UNITARY_TYPE", "REGION", "REGION_URI", "COUNTRY", "COUNTRY_URI",  "RELATED_SPATIAL_OBJECT",
                                  "SAME_AS_DBPEDIA", "SAME_AS_GEONAMES"))

But this just brings up loads of output in the console that is meaningless to me, it seems to be giving a summary of each file.

Is there any simple code to bring in CSV's, add the column names to each and then cbind them all together that anyone has?

CodePudding user response:

I am not 100% sure what exactly it is you need but my best guess would be something like this:

library(data.table)

y_path   <- 'C:/your_path/your_folder'
all_csv  <- list.files(path = y_path, pattern = '.csv', full.names = TRUE)
open_csv <- lapply(all_csv, \(x) fread(x, ...)) # ... here just signifying other arguments

one_df <- data.table::rbindlist(open_csv) 
# or: do.call(rbind, open_csv)
  • Related