Arrange by Column Before Grouping by Other Columns-CodePudding

I am trying to arrange a data frame by one column, and then group it by two other columns.

A sample code and my attempt is as follows:

df <- data.frame(person = c("p1", "p2", "p4", "p3","p2", "p3", "p1", "p4"),
                 data1 = c("a","a","b","b","b","a","b","a"),
                 data2 = c(8,7,6,5,4,3,2,1))


df1 <- df %>%
  group_by(person, data1) %>%
  arrange(desc(data2),  person, data1)


person data1 data2
  <chr>  <chr> <dbl>
1 p1     a         8
2 p2     a         7
3 p4     b         6
4 p3     b         5
5 p2     b         4
6 p3     a         3
7 p1     b         2
8 p4     a         1

It is supposed to descend (highest to lowest) of data2, but group it by person & data so that the corresponding person row follows underneath the higher row.

The desired result looks like this:

    person data1 data2
1     p1     a     8
2     p1     b     2
3     p2     a     7
4     p2     b     4
5     p4     b     6
6     p4     a     1
7     p3     b     5
8     p3     a     3

CodePudding user response：

group_by doesn't really do anything, it just sets up other functions to do things by group. However, the second sentence of the ?arrange help page is

Unlike other dplyr verbs, arrange() largely ignores grouping;

All you want to do is get the rows in a certain order. You don't need group_by, you need to arrange first by person, then data2 as a tie-breaker:

df %>%
  arrange(person, desc(data2))
#   person data1 data2
# 1     p1     a     8
# 2     p1     b     2
# 3     p2     a     7
# 4     p2     b     4
# 5     p4     b     6
# 6     p4     a     1
# 7     p3     b     5
# 8     p3     a     3

Alternately, you can use arrange's .by_group argument to do it by group:

df %>% group_by(person) %>%
  arrange(desc(data2), .by_group = TRUE)

See the ?arrange help page for more details.

CodePudding user response：

You could get the max data2 by person, join, and then arrange by that max

df%>% 
  inner_join(df %>% group_by(person) %>% summarize(m=max(data2))) %>%
  arrange(desc(m),desc(data2)) %>% 
  select(-m)

Output:

  person data1 data2
1     p1     a     8
2     p1     b     2
3     p2     a     7
4     p2     b     4
5     p4     b     6
6     p4     a     1
7     p3     b     5
8     p3     a     3