I'm combining 12 CSV files into one dataframe in R. Before doing this I want to ensure all the column names are an exact match with each other. I've made a dataframe where each column is the column names of the 12 CSV files.
jul21_cols <- data.frame(colnames(jul21))
aug21_cols <- data.frame(colnames(aug21))
sep21_cols <- data.frame(colnames(sep21))
oct21_cols <- data.frame(colnames(oct21))
nov21_cols <- data.frame(colnames(nov21))
dec21_cols <- data.frame(colnames(dec21))
jan22_cols <- data.frame(colnames(jan22))
feb22_cols <- data.frame(colnames(feb22))
mar22_cols <- data.frame(colnames(mar22))
apr22_cols <- data.frame(colnames(apr22))
may22_cols <- data.frame(colnames(may22))
jun22_cols <- data.frame(colnames(jun22))
col_df <- cbind(jul21_cols,aug21_cols,sep21_cols,oct21_cols,nov21_cols,dec21_cols,
jan22_cols,feb22_cols,mar22_cols,apr22_cols,may22_cols,jun22_cols)
I've tried using the identical function to compare 2 columns at a time.
identical(col_df[['jul21']], col_df[['aug21']])
identical(col_df[['aug21']], col_df[['sep21']])
identical(col_df[['sep21']], col_df[['oct21']])
identical(col_df[['oct21']], col_df[['nov21']])
identical(col_df[['nov21']], col_df[['dec21']])
identical(col_df[['dec21']], col_df[['jan22']])
identical(col_df[['jan22']], col_df[['feb22']])
identical(col_df[['feb22']], col_df[['mar22']])
identical(col_df[['mar22']], col_df[['apr22']])
identical(col_df[['apr22']], col_df[['may22']])
identical(col_df[['may22']], col_df[['jun22']])`
All of the identical lines return the value of TRUE
I'm just trying to verify that this code is telling me all my column names are identical in each CSV files before I move on. I'd also like to know if there is a more efficient way to solve this problem.
CodePudding user response:
First, identical()
will only return TRUE
if the two dataframes have all the same column names in the same order. If you don’t care about order, just that all the same names are in both dataframes, you can sort()
the names before comparing as shown below.
Note
Test data. BOD comes with R.
jul21 <- aug21 <- sep21 <- BOD
names(sep21) <- c("Time", "DEMAND")