Given a data frame df
, I want a new data frame that will keep the minimum values of columns Y
and Z
in a column and their corresponding values on the X
column in another column using R
.
df <- read.table(text =
"X Y Z
1 2 3 1.4
2 4 5 1.7
3 6 7 1.2
4 8 9 2.1
5 10 11 3.2",
header = TRUE)
Trial
Here is what I have tried using R
which is labour intensive.
data.frame(
x_min = c(df[df$Y == min(df[,"Y"]), "X"], df[df$Z == min(df[,"Z"]), "X"]),
min_Y_Z = c(min(df[,"Y"]), min(df[,"Z"]))
)
I know that apply(df, 2, min)
will only work if I am to find the minimum of each and every column in the data frame, so no need to look toward the apply()
family function.
The Result
x_min min_Y_Z
1 2 3.0
2 6 1.2
What I Want
I want an R
-elegant way to write the same solution. I will not mind using packages in R
CodePudding user response:
One way to do this is using the subset()
and mutate()
functions from the tidyverse package:
library(tidyverse)
df_new <- df %>%
subset(Y == min(Y) | Z == min(Z)) %>%
mutate(min_Y_Z = c(min(Y), min(Z)))
This gives you this output:
X Y Z min_Y_Z
1 2 3 1.4 3.0
3 6 7 1.2 1.2
If needed, removing the old 'Y' and 'Z' columns is pretty simple to do as well.
CodePudding user response:
In base R, you could use lapply
. This version is similar to your current method, but does not rely on the explicit column names as much, since "Y" and "Z" are abstracted by lapply
. This version also places the original column names in the rownames.
lst <- lapply(df[c('Y', 'Z')], function(i) {
min_index <- which(i == min(i))[1]
return(data.frame(x_min = df$X[min_index], min_Y_Z = i[min_index]))
})
result <- do.call(rbind, lst)
x_min min_Y_Z
Y 2 3.0
Z 6 1.2
Or another tidyverse solution:
result <- df %>%
summarize(across(c(Y, Z), ~which(. == min(.)))) %>%
pivot_longer(everything(), values_to = 'idx') %>%
rowwise() %>%
mutate(
x_min = df[['X']][idx],
min_Y_Z = df[[name]][idx]
) %>%
select(-name, -idx)
x_min min_Y_Z
<int> <dbl>
1 2 3
2 6 1.2