I have three single-line dataframes with different numbers and names of columns...
df1:
0 3 6 7 10 14 17
2 18 9 1 14 2 1 1
df2:
0 3 7 9 10 13 14 17 21 35
2 10 4 8 1 5 2 11 2 1 1
df3:
0 3 7 10 12
2 7 3 11 3 1
...and I have a master dataframe.
CREATION CODE
masterdf <- data.frame(matrix(ncol = 50, nrow = 0))
colnames(masterdf) <- c('0',2:50)
0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
I want to take each of the smaller dataframes and put one per row into the master dataframe with the values in the matching columns. When finished, the updated master dataframe will look like this:
0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
1 18 NA 9 NA NA 1 14 NA NA 2 NA NA NA 1 NA NA 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2 10 NA 4 NA NA NA 8 NA 1 5 NA NA 2 11 NA NA 2 NA NA NA 1 NA NA NA NA NA NA NA NA NA NA NA
3 7 NA 3 NA NA NA 11 NA NA 3 NA 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
2 NA NA 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Yes, the column names do need to remain as numbers. As you can see, the number of columns varies with each of the numbered dataframes.
Other notes:
The first column name is 0 and the second column name is 2.
The 0 column will ALWAYS have a value in it in every dataframe.
The row number (2) in each numbered dataframe is superfluous for my purposes.
I've tried nested loops without success.
My use case will end up with thousands of rows in the master dataframe.
Thoughts?
CodePudding user response:
I think you can try the match
function. It is a base R function. See the quick example below:
?match
match("2", c("1","2","3"))
CodePudding user response:
Two attempts:
- basic
for
loop, which might be a bit slow with many rows:
df_list <- list(df1,df2,df3)
for(i in seq_along(df_list)) {
masterdf[i, names(df_list[[i]])] <- df_list[[i]]
}
- vectorised approach using matrix indexing and a single assignment to all matching rows and columns
df_list <- list(df1,df2,df3)
masterdf[seq_along(df_list),] <- NA
masterdf[cbind(
rep(seq_along(df_list), lengths(df_list)),
match(unlist(lapply(df_list, names)), names(masterdf))
)] <- unlist(df_list)