Home > database >  R: Getting Second Value After Grouping and Summarizing
R: Getting Second Value After Grouping and Summarizing

Time:11-16

Various R functions make it easy to use group_by and summarize to pull a value from a grouped variable. So in the resulting dataframe, I can use group_by and summarise to create, for example, a new column that contains the maximum or minimum value of a variable within each group. Meaning, with this data:

name, value
foo, 100
foo, 200
foo, 300
bar, 400
bar, 500
bar, 600

I can easily get the max or min for each value of name:

group_by(name) %>% summarize(maxValue = max(value)

But suppose I want the second ranked value for each name? Meaning suppose I want my result to be

name maxValue secondValue
foo 300 200
bar 600 500

In other words, how do I fill in the blank in this:

df %>% group_by(name) %>% 
summarize(maxValue = max(value), 
  secondValue = _________)

Thanks, from an r newbie, for any help!

CodePudding user response:

dplyr has the handy nth() function to extract the nth element out of the list.

df %>% group_by(name) %>% 
   summarize(maxValue = max(value), 
             secondValue =nth(value, 2, order_by = value))

CodePudding user response:

This is one approach, excluding the max value from the second calculation.

df %>% 
  group_by(name) %>% 
  summarize(maxValue = max(value), 
    secondValue = max(value[which.max(value) != row_number()]))
# A tibble: 2 × 3
  name  maxValue secondValue
  <chr>    <int>       <int>
1 bar        600         500
2 foo        300         200

Data

df <- structure(list(name = c("foo", "foo", "foo", "bar", "bar", "bar"
), value = c(100, 400, 300, 400, 500, 600)), row.names = c(NA,
-6L), class = "data.frame")
  • Related