sum() condition in ifelse statement-CodePudding

This question is related to this question My question is about R: How to number each repetition in a table in R?

Where basically the repetitions are numbered. E.g two repetitions: 1,2 ; three repetitions: 1,2,3 etc... But if the value is unique (only one time) it should have not 1 but NA

data: (from akrun, many thanks!)

df1 <- structure(list(Fullname = c("Peter", "Peter", "Alison", "Warren", 
                                   "Jack", "Jack", "Jack", "Jack", "Susan", "Susan", "Henry", "Walison", 
                                   "Tinder", "Peter", "Henry", "Tinder")), row.names = c(NA, -16L
                                   ), class = "data.frame")

my solution would be this:

df1 %>% 
  group_by(Fullname) %>% 
  mutate(newcol = seq_along(Fullname)) 

  Fullname newcol
   <chr>     <int>
 1 Peter         1
 2 Peter         2
 3 Alison        1
 4 Warren        1
 5 Jack          1
 6 Jack          2
 7 Jack          3
 8 Jack          4
 9 Susan         1
10 Susan         2
11 Henry         1
12 Walison       1
13 Tinder        1
14 Peter         3
15 Henry         2
16 Tinder        2

Now I try to set each value that occurs once (e.g. Alison, Warren and Henry) to NAlike akrun did here My question is about R: How to number each repetition in a table in R?

My code is with a ifelse statement checking if the sum of the group is >1.

df1 %>% 
  group_by(Fullname) %>% 
  mutate(newcol = seq_along(Fullname)) %>% 
  mutate(newcol = ifelse(sum(newcol)>1, newcol, NA))

but I get:

 Fullname newcol
   <chr>     <int>
 1 Peter         1
 2 Peter         1
 3 Alison       NA
 4 Warren       NA
 5 Jack          1
 6 Jack          1
 7 Jack          1
 8 Jack          1
 9 Susan         1
10 Susan         1
11 Henry         1
12 Walison      NA
13 Tinder        1
14 Peter         1
15 Henry         1
16 Tinder        1

And I can't grasp why?

CodePudding user response：

We need if/else here instead of ifelse as ifelse requires all arguments to be same length, sum returns a single value and if it is TRUE, then all becomes TRUE

library(dplyr)
df1 %>% 
  group_by(Fullname) %>% 
  mutate(newcol = row_number(), 
       newcol = if(sum(newcol)> 1) newcol else NA) %>%
  ungroup

-output

# A tibble: 16 × 2
   Fullname newcol
   <chr>     <int>
 1 Peter         1
 2 Peter         2
 3 Alison       NA
 4 Warren       NA
 5 Jack          1
 6 Jack          2
 7 Jack          3
 8 Jack          4
 9 Susan         1
10 Susan         2
11 Henry         1
12 Walison      NA
13 Tinder        1
14 Peter         3
15 Henry         2
16 Tinder        2

Now, we look at the issue. The 'newcol2' values are recycled values of single TRUE/FALSE. In the ifelse, as all arguments need to be same length, the logical part is just of length 1.

df1 %>% 
   group_by(Fullname) %>% 
   mutate(newcol = row_number(), newcol2 = sum(newcol) > 1)
# A tibble: 16 × 3
# Groups:   Fullname [8]
   Fullname newcol newcol2
   <chr>     <int> <lgl>  
 1 Peter         1 TRUE   
 2 Peter         2 TRUE   
 3 Alison        1 FALSE  
 4 Warren        1 FALSE  
 5 Jack          1 TRUE   
 6 Jack          2 TRUE   
 7 Jack          3 TRUE   
 8 Jack          4 TRUE   
 9 Susan         1 TRUE   
10 Susan         2 TRUE   
11 Henry         1 TRUE   
12 Walison       1 FALSE  
13 Tinder        1 TRUE   
14 Peter         3 TRUE   
15 Henry         2 TRUE   
16 Tinder        2 TRUE

The way to tackle is replicate to make the lengths same

df1 %>% 
  group_by(Fullname) %>% 
  mutate(newcol = seq_along(Fullname)) %>% 
  mutate(newcol = ifelse(rep(sum(newcol)>1, n()), newcol, NA))
# A tibble: 16 × 2
# Groups:   Fullname [8]
   Fullname newcol
   <chr>     <int>
 1 Peter         1
 2 Peter         2
 3 Alison       NA
 4 Warren       NA
 5 Jack          1
 6 Jack          2
 7 Jack          3
 8 Jack          4
 9 Susan         1
10 Susan         2
11 Henry         1
12 Walison      NA
13 Tinder        1
14 Peter         3
15 Henry         2
16 Tinder        2

In order to understand it better, just take a simple vector

> v1 <- c(1:5)
> sum(v1) > 4
[1] TRUE
> ifelse(sum(v1) > 4, v1, NA)
[1] 1

The sum here is 15 and it is definitely greater than 4. As soon as the TRUE is found, it just returns the first element of the vector i.e. 1 and stops. In the %>% also, this is what is happening, but because there is recycling, the 1 gets repeated to fill the whole group