Home > Mobile >  How to return the dataframe with the most rows: R
How to return the dataframe with the most rows: R

Time:11-14

I'm writing a function that will subset a dataframe based on different conditions. I need to return the dataframe with maximum row count.

df2 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=200))
df3 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=90))
df4 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=600))
df5 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=70))

max_row_df = ifelse(nrow(df) > nrow(df2) & nrow(df) > nrow(df5), deparse(substitute(df)),
                    ifelse(nrow(df2) > nrow(df3), deparse(substitute(df2)),
                           ifelse(nrow(df3) > nrow(df4), deparse(substitute(df3)),
                                  ifelse(nrow(df4) > nrow(df5),deparse(substitute(df4)),
                                  deparse(substitute(df5))))))
max_row_df

This statement has flaw in logic but is only method to return the name of the dataframe, which is what I need in order to return the selected dataframe from the function.

row_lengths <- c(nrow(df), nrow(df2), nrow(df3), nrow(df4), nrow(df5))
max_row <- max(row_lengths)

Can't deparse the df names in the method above. Is there a better approach as if and for only returning boolean values.

Any insight appreciated.

CodePudding user response:

Here's a solution. You put your data.frames in a list and use purrr::reduce to compare them and keep the largest one:

library(purrr)

df2 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=200))
df3 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=90))
df4 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=600))
df5 <- as.data.frame(matrix(runif(n=10, min=1, max=20), nrow=70))

reduce(
  list(df2, df3, df4, df5),
  ~ if (nrow(.x) > nrow(.y)) .x else .y
)

CodePudding user response:

This worked, thanks Gregor! Don't need to pull df name in this instance.

# create a list of objects collected based on specified name
  df_list = mget(paste0("df", c("", as.character(2:5))))
  
  # find the object in the list with max rows and return
  i_most = which.max(sapply(df_list, nrow))
  
  return(df_list[[i_most]])
  •  Tags:  
  • r
  • Related