New-ish to R and I feel like this has a simple solution, but I can't figure it out.
I have 59 excel files that I want to combine. However, 4 of the columns have a mix of dates and NA
's (depending on if the study animal is a migrant or not) so R won't let me combine them because some are numeric and some are character. I was hoping to read all of the excel files into R, convert those 4 columns in each file to as.character
, and then merge them all. I figured a loop could do this.
Anything I find online has me typing out the name for each read file, which I don't really want to do for 59 files. And once I do have them read into R and those columns converted, can I merge them from R easily? Sorry if this is simple, but I'm not sure what to do that would make this easier.
CodePudding user response:
You can do this quickly using lapply
. It was unclear exactly how you wanted to combine the files (a true merge by a common variable, or append the rows, or append the columns). Either way, I do not believe you need to change anything to as.character
for any of the below approaches (2A - 2C) to work:
library(readxl)
# 1. Read in all excel files in a folder given a specific filepath
filepath <- "your/file/path/"
file_list <- list.files(path = filepath, pattern='*.xlsx')
df_list <- lapply(file_list, read_excel)
# 2a. Merge data (assuming a unique identifier, i.e "studyid")
final_data <- Reduce(function(...) merge(..., by = "studyid", all = TRUE), df_list)
# 2b. If all files have the same columns, append to one long dataset
final_data <- do.call(rbind, df_list)
# 2c. If you want to make a wide dataset (append all columns)
final_data <- do.call(cbind, df_list)
CodePudding user response:
Yes, you can use a loop to read multiple Excel files into R and convert the columns you want to character. Once you have read all of the files into R, you can use the merge() function to combine them. Here is an example of how you could do this:
# create a vector of file names
filenames <- c("file1.xlsx", "file2.xlsx", "file3.xlsx", ...)
# create an empty list to store the data frames
df_list <- list()
# loop through the file names
for (f in filenames) {
# read in the Excel file
df <- read_excel(f)
# convert the columns you want to character
df[, c("col1", "col2", "col3", "col4")] <- as.character(df[, c("col1", "col2", "col3", "col4")])
# add the data frame to the list
df_list[[f]] <- df
}
# use merge() to combine the data frames in the list
final_df <- merge(df_list[[1]], df_list[[2]], ...)
This code will read each Excel file, convert the specified columns to character, and then add the resulting data frame to a list. After all of the files have been processed, the merge() function is used to combine the data frames in the list into a single data frame.