Home > Enterprise >  How do I read multiple csv files into R and ensure that all columns are the same data?
How do I read multiple csv files into R and ensure that all columns are the same data?

Time:05-24

I am trying to merge several large datafiles into one usable dataframe into R using lapply to read in the files. That part works just fine, however, one of the files has changed a single column from integer data to character data. Is there a way to read over all of the files and force the single column in the single file to be the same data type? I have found a workaround through trial and error, but a single solution would be great. For reference, this is the ICEWS event data found on the Harvard dataverse.

list_file<-list.files(pattern="*.csv") %>%
    lapply(read.csv,stringsAsFactors=F) %>%
    bind_rows

head(list_file)

These two separate code blocks posted here work independently, but I would ideally like the as.integer command to be integrated into the lapply process so that I don't have to repeat the process of reading in and merging the data from the files. Below is simply the work around that I have used.

list_file1<-list.files(pattern="*.csv") %>%
    lapply(read.csv,stringsAsFactors=F) %>%
    bind_rows
head(list_file1)
class(list_file1$CAMEO.Code)
 
list_file1$CAMEO.Code<-as.integer(list_file1$CAMEO.Code)
class(list_file1$CAMEO.Code)
head(list_file1$CAMEO.Code)

CodePudding user response:

You could do something like this:

bind_rows(
  lapply(list.files(pattern="*.csv"), function(f) {
    read.csv(f,stringsAsFactors=F) %>%
      mutate(CAMEO.Code = as.integer(CAMEO.Code))
  })
)

CodePudding user response:

You could also try via map() function and the col_types possibility in readr::read_csv

list.files(pattern="*.csv", full.names = T) %>% 
  purrr::map_dfr(~.x %>% 
                   readr::read_csv(col_types = cols(.default = "?", CAMEO.Code = "i")))
  • Related