I'm writing a function that will subset a dataframe based on different conditions. I need to return the dataframe with maximum row count.
df2 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=200))
df3 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=90))
df4 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=600))
df5 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=70))
max_row_df = ifelse(nrow(df) > nrow(df2) & nrow(df) > nrow(df5), deparse(substitute(df)),
ifelse(nrow(df2) > nrow(df3), deparse(substitute(df2)),
ifelse(nrow(df3) > nrow(df4), deparse(substitute(df3)),
ifelse(nrow(df4) > nrow(df5),deparse(substitute(df4)),
deparse(substitute(df5))))))
max_row_df
This statement has flaw in logic but is only method to return the name of the dataframe, which is what I need in order to return the selected dataframe from the function.
row_lengths <- c(nrow(df), nrow(df2), nrow(df3), nrow(df4), nrow(df5))
max_row <- max(row_lengths)
Can't deparse the df names in the method above. Is there a better approach as if and for only returning boolean values.
Any insight appreciated.
CodePudding user response:
Here's a solution. You put your data.frames in a list and use purrr::reduce
to compare them and keep the largest one:
library(purrr)
df2 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=200))
df3 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=90))
df4 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=600))
df5 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=70))
reduce(
list(df2, df3, df4, df5),
~ if (nrow(.x) > nrow(.y)) .x else .y
)
CodePudding user response:
This worked, thanks Gregor! Don't need to pull df name in this instance.
# create a list of objects collected based on specified name
df_list = mget(paste0("df", c("", as.character(2:5))))
# find the object in the list with max rows and return
i_most = which.max(sapply(df_list, nrow))
return(df_list[[i_most]])