I am trying to arrange a data frame by one column, and then group it by two other columns.
A sample code and my attempt is as follows:
df <- data.frame(person = c("p1", "p2", "p4", "p3","p2", "p3", "p1", "p4"),
data1 = c("a","a","b","b","b","a","b","a"),
data2 = c(8,7,6,5,4,3,2,1))
df1 <- df %>%
group_by(person, data1) %>%
arrange(desc(data2), person, data1)
person data1 data2
<chr> <chr> <dbl>
1 p1 a 8
2 p2 a 7
3 p4 b 6
4 p3 b 5
5 p2 b 4
6 p3 a 3
7 p1 b 2
8 p4 a 1
It is supposed to descend (highest to lowest) of data2
, but group it by person
& data
so that the corresponding person
row follows underneath the higher row.
The desired result looks like this:
person data1 data2
1 p1 a 8
2 p1 b 2
3 p2 a 7
4 p2 b 4
5 p4 b 6
6 p4 a 1
7 p3 b 5
8 p3 a 3
CodePudding user response:
group_by
doesn't really do anything, it just sets up other functions to do things by group. However, the second sentence of the ?arrange
help page is
Unlike other
dplyr
verbs,arrange()
largely ignores grouping;
All you want to do is get the rows in a certain order. You don't need group_by
, you need to arrange first by person, then data2 as a tie-breaker:
df %>%
arrange(person, desc(data2))
# person data1 data2
# 1 p1 a 8
# 2 p1 b 2
# 3 p2 a 7
# 4 p2 b 4
# 5 p4 b 6
# 6 p4 a 1
# 7 p3 b 5
# 8 p3 a 3
Alternately, you can use arrange
's .by_group
argument to do it by group:
df %>% group_by(person) %>%
arrange(desc(data2), .by_group = TRUE)
See the ?arrange
help page for more details.
CodePudding user response:
You could get the max data2 by person, join, and then arrange by that max
df%>%
inner_join(df %>% group_by(person) %>% summarize(m=max(data2))) %>%
arrange(desc(m),desc(data2)) %>%
select(-m)
Output:
person data1 data2
1 p1 a 8
2 p1 b 2
3 p2 a 7
4 p2 b 4
5 p4 b 6
6 p4 a 1
7 p3 b 5
8 p3 a 3