I have a large list of dataframes that I would like to subset and reorder, so that they can be horizontally joined. Ideally, I would merge them, however this consistently exceeds my computer's capabiltiies, so I need to join them manually.
My large list is a list of 10 dataframes, all with 10 unique variables and one shared variable, with over 1000 rows for each dataframe:
df1:
ID, V2, V3, etc.
df2:
ID, V2_2, V3_2, etc.
and so forth.
All of these dataframes are contained within the same large list. How can I subset all of these dataframes within that large list simultaneously, such that they only include observations where ID > 100? Similarly, how can I order all of these dataframes simultaneously by the ID column, so that I can then bind them horizontally?
This seems to be the only way to have the dataframes merge, because merging the dataframes by ID directly does not appear to work, since the size of the dataframes (even though IDs are all the same for each dataframe, and each ID is unique) causes my R to freeze.
CodePudding user response:
Try the following. Untested, since there are no test data.
1. Filter and sort
Filter ID > 100
and sort by ID
.
df_list <- lapply(df_list, \(x) {
x <- x[x$ID > 100, ]
x[order(x$ID), ]
})
2. Merge the data.frames
This is the complicated part. The code below creates a result data.frame df_all
and then puts each data.frame in df_list
in its place by matching the id's. This is probably faster and no merge
and Reduce
function calls are needed.
all_cols <- unique(sapply(df_list, names))
all_rows <- unique(sapply(df_list, `[[`, 'ID'))
all_rows <- sort(all_rows)
df_all <- matrix(nrow = length(all_rows), ncol = length(all_cols))
df_all <- as.data.frame(df_all)
names(df_all) <- all_cols
df_all$ID <- all_rows
for(i in seq_along((df_list))) {
i <- match(df_list[[i]]$ID, all_rows)
df_all[i, names(df_list[[i]])] <- df_list[[i]]
}
CodePudding user response:
This should return the desired list:
lapply(df_list, \(df) df[df$ID > 100, ])
EDIT
Update after comment that you have a list of lists. Try this:
lapply(df_list, \(df) {
df <- data.frame(unlist(df, recursive=FALSE))
df[df$ID > 100, ]
}
)