Home > Back-end >  Why does this dplyr group function give strange results?
Why does this dplyr group function give strange results?

Time:08-15

When I run the below reproducible code I get the desired grouping results in the GroupRank column shown immediately beneath:

library(dplyr)

myData <- 
  data.frame(
    Element = c("A","A","B","A","C","C"),
    Group = c(0,0,0,0,1,1)
  )

myDataGroups <- myData %>%
  mutate(origOrder = row_number()) %>%  
  group_by(Element) %>% 
  mutate(ElementCnt = row_number()) %>%
  ungroup() %>%  
  mutate(Group = factor(Group, unique(Group))) %>% 
  arrange(Group) %>% 
  mutate(groupCt = cumsum(Group != lag(Group, 1, Group[[1]])) - 1L) %>%  
  group_by(Group) %>%  
  mutate(GroupRank = ElementCnt - max(0L,groupCt),
         GroupRank = if_else(as.character(Group) == "0", ElementCnt, min(GroupRank))
  )%>%  
  ungroup() %>%
  arrange(origOrder)
myDataGroups

> myDataGroups
# A tibble: 6 x 6
  Element Group origOrder ElementCnt groupCt GroupRank
  <chr>   <fct>     <int>      <int>   <int>     <int>
1 A       0             1          1      -1         1
2 A       0             2          2      -1         2
3 B       0             3          1      -1         1
4 A       0             4          3      -1         3
5 C       1             5          1       0         1
6 C       1             6          2       0         1

However when I take the line from the above code GroupRank = if_else(as.character(Group) == "0", ElementCnt, min(GroupRank)) and simply add a max function like this GroupRank = max(1L,if_else( as.character(Group) == "0", ElementCnt, min(GroupRank))) (run as 1 and 1L both ways and get the same results) I get the strange output shown below. GroupRank shouldn´t have changed from the above output:

  Element Group origOrder ElementCnt groupCt GroupRank
  <chr>   <fct>     <int>      <int>   <int>     <int>
1 A       0             1          1      -1         3
2 A       0             2          2      -1         3
3 B       0             3          1      -1         3
4 A       0             4          3      -1         3
5 C       1             5          1       0         1
6 C       1             6          2       0         1

What am I doing wrong here? Am I using max() incorrectly?

CodePudding user response:

Note the difference between max() and pmax().

max(1:5, 5:1)
#> [1] 5
pmax(1:5, 5:1)
#> [1] 5 4 3 4 5

max() returns a scalar, which is why you get a constant value per group. pmax() does what you apparently expect, which is return a rowwise maximum vector.

  • Related