Home > Mobile >  Merge a list of dataframes into a single dataframe in R
Merge a list of dataframes into a single dataframe in R

Time:04-28

I have a very large list (just one list) with 13,500 elements in it. Each element is a dataframe with 1 row and 12 columns, each dataframe is structured the same (same columns names and similar data in each column). I want to merge all elements in this list into one dataframe. Essentially these new dataframe will have 13,500 rows and 12 columns. I need everything in one dataframe to work with ggplot and ease of working with data in a dataframe. Can someone suggest the best way to do this. Thanks for the help.

I tried using the purr:: merge() and was not successful. Or at least the process did not finish in more than 10 min and I had to terminate R studio.

Here some data from the list have:

list(structure(list(n1 = 10, n2 = 10, mean_1 = 0, mean_2 = 0, var_1 = 1, var_2 = 1, tpooled = 2.93152220266846, pvalue_pooled = 0.00891647393074033, result_pooled = 1, t_unpooled = 2.93152220266846, pvalue_unpooled = 0.00931815204271521, result_unpooled = 1), class = "data.frame", row.names = "n1"), structure(list(n1 = 30, n2 = 10, mean_1 = 0, mean_2 = 0, var_1 = 1, var_2 = 1, tpooled = -0.312649684961248, pvalue_pooled = 0.756256229272491, result_pooled = 0, t_unpooled = -0.248766791009062, pvalue_unpooled = 0.808124700588531, result_unpooled = 0)

CodePudding user response:

Your dput() code was not completed, so I am creating an example list based on how you described it:

ll <- vector(mode = "list", length = 100)
for (i in 1:length(ll)){
  ll[[i]] <- data.frame(matrix(runif(12), nrow = 1))
}

Which is a list of length 100, each position containing a data frame of 1 row and 12 columns of a random number. To make it into one large data frame (100 rows and 12 columns), try:

ll_df <- do.call(rbind, ll)

Output:

# > ll_df
#     X1          X2         X3          X4         X5          X6         X7         X8          X9         X10        X11         X12
# 1  0.231912927 0.270163433 0.82299350 0.025836254 0.40592551 0.596034614 0.52873965 0.68257091 0.507812908 0.554371795 0.84124010 0.312510160
# 2  0.035948120 0.815994061 0.77857679 0.859379491 0.06571936 0.008806119 0.59168088 0.86961538 0.446291886 0.037575005 0.41029058 0.365216211
# 3  0.476584831 0.133677756 0.47945626 0.264312692 0.48993294 0.906061205 0.50099734 0.70350681 0.057910028 0.689310918 0.79879528 0.018855033
# 4  0.036814572 0.577822232 0.79003586 0.735261033 0.26853772 0.805366424 0.42493288 0.16521519 0.604047569 0.825760356 0.78095093 0.081476899
# 5  0.070758368 0.958960018 0.09029276 0.212251252 0.43920359 0.777871489 0.85140796 0.62472390 0.388040910 0.143754851 0.88167280 0.873741813
# 6  0.338623692 0.513312964 0.49393542 0.793437806 0.91841512 0.586360269 0.82348039 0.80743891 0.281572984 0.508648599 0.29522944 0.867623769
#...
# continues

CodePudding user response:

Another option is to use bind_rows from dplyr, which will create one dataframe from the list of dataframes, and is a fairly efficient option.

library(dplyr)

bind_rows(ll)

However, as @nicola mentioned, rbindlist from data.table will likely be the fastest.

data.table::rbindlist(ll)
  • Related