Home > Software engineering >  In R, how do I check to see if a column name in one dataframe is present in another dataframe and th
In R, how do I check to see if a column name in one dataframe is present in another dataframe and th

Time:09-07

I have three single-line dataframes with different numbers and names of columns...

df1: 
   0 3 6  7 10 14 17
2 18 9 1 14  2  1  1

df2:
   0 3 7 9 10 13 14 17 21 35
2 10 4 8 1  5  2 11  2  1  1

df3:
   0 3 7  10 12
2  7 3 11  3  1

...and I have a master dataframe.

CREATION CODE
masterdf <- data.frame(matrix(ncol = 50, nrow = 0))
colnames(masterdf) <- c('0',2:50)

   0  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 
1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 
  33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 
1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 

I want to take each of the smaller dataframes and put one per row into the master dataframe with the values in the matching columns. When finished, the updated master dataframe will look like this:

   0  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 
1 18 NA  9 NA NA  1 14 NA NA  2 NA NA NA  1 NA NA  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2 10 NA  4 NA NA NA  8 NA  1  5 NA NA  2 11 NA NA  2 NA NA NA  1 NA NA NA NA NA NA NA NA NA NA NA 
3  7 NA  3 NA NA NA 11 NA NA  3 NA  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 

  33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 
1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2 NA NA  1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 

Yes, the column names do need to remain as numbers. As you can see, the number of columns varies with each of the numbered dataframes.

Other notes:

The first column name is 0 and the second column name is 2.

The 0 column will ALWAYS have a value in it in every dataframe.

The row number (2) in each numbered dataframe is superfluous for my purposes.

I've tried nested loops without success.

My use case will end up with thousands of rows in the master dataframe.

Thoughts?

CodePudding user response:

I think you can try the match function. It is a base R function. See the quick example below:

?match
match("2", c("1","2","3"))

CodePudding user response:

Two attempts:

  1. basic for loop, which might be a bit slow with many rows:
df_list <- list(df1,df2,df3)
for(i in seq_along(df_list)) {
    masterdf[i, names(df_list[[i]])] <- df_list[[i]]
}
  1. vectorised approach using matrix indexing and a single assignment to all matching rows and columns
df_list <- list(df1,df2,df3)
masterdf[seq_along(df_list),] <- NA
masterdf[cbind(
    rep(seq_along(df_list), lengths(df_list)),
    match(unlist(lapply(df_list, names)), names(masterdf))
)] <- unlist(df_list)
  • Related