Various R functions make it easy to use group_by and summarize to pull a value from a grouped variable. So in the resulting dataframe, I can use group_by and summarise to create, for example, a new column that contains the maximum or minimum value of a variable within each group. Meaning, with this data:
name, value
foo, 100
foo, 200
foo, 300
bar, 400
bar, 500
bar, 600
I can easily get the max or min for each value of name:
group_by(name) %>% summarize(maxValue = max(value)
But suppose I want the second ranked value for each name? Meaning suppose I want my result to be
name maxValue midValue
foo 300 200
bar 600 500
In other words, how do I fill in the blank in this:
df %>% group_by(name) %>%
summarize(maxValue = max(value),
secondValue = _________)
Thanks, from an r newbie, for any help!
CodePudding user response:
library(dplyr)
df %>%
group_by(name) %>%
arrange(value) %>%
summarise(maxValue = max(value),
midValue = value[2])
Result:
# A tibble: 2 × 3
name maxValue midValue
<chr> <int> <int>
1 Bar 600 500
2 Foo 300 200
CodePudding user response:
This should do:
df %>% group_by(name) %>% arrange(desc(value)) %>% slice(2)
Code:
a = 'name value
Foo 100
Foo 200
Foo 300
Bar 400
Bar 500
Bar 600'
df = read.table(text = a, header = T)
df %>% group_by(name) %>% arrange(desc(value)) %>% slice(2)
Output:
# A tibble: 2 × 2
# Groups: name [2]
name value
<fct> <int>
1 Bar 500
2 Foo 200
CodePudding user response:
library(dplyr)
df <- data.frame(
name = c("Foo", "Foo", "Foo", "Bar", "Bar", "Bar"),
value = c(100, 200, 300, 400, 500, 600)
)
df %>%
group_by(name) %>%
summarize(secondValue = sort(value, decreasing = TRUE)[2])
CodePudding user response:
All we need to do is pull out the 2 maximum values, and select the second largest one.
Note that the code below will handle ties (not important in your example but could be important with other data).
df %>%
group_by(name) %>%
summarise(maxvalue = sortN(value, 1),
midvalue = sortN(value, 2)[2])
# A tibble: 2 × 3
name maxvalue midvalue
<chr> <int> <int>
1 Bar 600 500
2 Foo 300 200
The function sortN(x, n, type = "max")
is defined below. It pulls out the n
max/min values of x
. This is heavily based on the post in my comment. It is not necessary to define a whole function for this problem (as the other answers show), but I find this function useful for a range of problems, so it's nice to have.
sortN <- function(x, n, type = "max") {
# GR 1 - If type is not "max" or "min", error
if (! type %in% c("max", "min")) {
stop("type must be max or min.")
}
# GR 2 - If n >= length(unique(x)), return whole vector
if (n >= length(unique(x))){
return(unique(x))
}
# Change based on whether the user wants min or max
type <- switch(type, min = FALSE, max = TRUE)
if (type) {
x <- unique(x)
partial <- length(x) - n 1
out <- x[x >= sort(x, partial = partial)[partial]]
sort(out, decreasing = TRUE)
} else {
out <- -sortN(x = -x, n = n, type = "max")
sort(out, decreasing = FALSE)
}
}