Home > Net >  Elegant Way to Get the Minimum of More than One Column of a Data Frame and Their Corresponding Value
Elegant Way to Get the Minimum of More than One Column of a Data Frame and Their Corresponding Value

Time:04-16

Given a data frame df, I want a new data frame that will keep the minimum values of columns Y and Z in a column and their corresponding values on the X column in another column using R.

df <- read.table(text = 
                "X  Y  Z
              1  2  3  1.4
              2  4  5  1.7
              3  6  7  1.2
              4  8  9  2.1
              5 10 11  3.2",
              header = TRUE)

Trial

Here is what I have tried using R which is labour intensive.

data.frame(
  x_min = c(df[df$Y == min(df[,"Y"]), "X"], df[df$Z == min(df[,"Z"]), "X"]),
  min_Y_Z = c(min(df[,"Y"]), min(df[,"Z"]))
  )

I know that apply(df, 2, min) will only work if I am to find the minimum of each and every column in the data frame, so no need to look toward the apply() family function.

The Result

  x_min min_Y_Z
1     2     3.0
2     6     1.2

What I Want

I want an R-elegant way to write the same solution. I will not mind using packages in R

CodePudding user response:

One way to do this is using the subset() and mutate() functions from the tidyverse package:

library(tidyverse)

df_new <- df %>% 
  subset(Y == min(Y) | Z == min(Z)) %>%
  mutate(min_Y_Z = c(min(Y), min(Z)))

This gives you this output:

X Y   Z min_Y_Z
1 2 3 1.4     3.0
3 6 7 1.2     1.2

If needed, removing the old 'Y' and 'Z' columns is pretty simple to do as well.

CodePudding user response:

In base R, you could use lapply. This version is similar to your current method, but does not rely on the explicit column names as much, since "Y" and "Z" are abstracted by lapply. This version also places the original column names in the rownames.

lst <- lapply(df[c('Y', 'Z')], function(i) {
  
  min_index <- which(i == min(i))[1]
  return(data.frame(x_min = df$X[min_index], min_Y_Z = i[min_index]))
  
})

result <- do.call(rbind, lst)

  x_min min_Y_Z
Y     2     3.0
Z     6     1.2

Or another tidyverse solution:

result <- df %>% 
  summarize(across(c(Y, Z), ~which(. == min(.)))) %>% 
  pivot_longer(everything(), values_to = 'idx') %>% 
  rowwise() %>% 
  mutate(
    x_min = df[['X']][idx],
    min_Y_Z = df[[name]][idx]
  ) %>% 
  select(-name, -idx)

  x_min min_Y_Z
  <int>   <dbl>
1     2     3  
2     6     1.2
  •  Tags:  
  • r
  • Related