This question is related to this question My question is about R: How to number each repetition in a table in R?
Where basically the repetitions are numbered. E.g two repetitions: 1,2 ; three repetitions: 1,2,3 etc... But if the value is unique (only one time) it should have not 1
but NA
data: (from akrun, many thanks!)
df1 <- structure(list(Fullname = c("Peter", "Peter", "Alison", "Warren",
"Jack", "Jack", "Jack", "Jack", "Susan", "Susan", "Henry", "Walison",
"Tinder", "Peter", "Henry", "Tinder")), row.names = c(NA, -16L
), class = "data.frame")
my solution would be this:
df1 %>%
group_by(Fullname) %>%
mutate(newcol = seq_along(Fullname))
Fullname newcol
<chr> <int>
1 Peter 1
2 Peter 2
3 Alison 1
4 Warren 1
5 Jack 1
6 Jack 2
7 Jack 3
8 Jack 4
9 Susan 1
10 Susan 2
11 Henry 1
12 Walison 1
13 Tinder 1
14 Peter 3
15 Henry 2
16 Tinder 2
Now I try to set each value that occurs once (e.g. Alison, Warren and Henry) to NA
like akrun did here My question is about R: How to number each repetition in a table in R?
My code is with a ifelse
statement checking if the sum of the group is >1.
df1 %>%
group_by(Fullname) %>%
mutate(newcol = seq_along(Fullname)) %>%
mutate(newcol = ifelse(sum(newcol)>1, newcol, NA))
but I get:
Fullname newcol
<chr> <int>
1 Peter 1
2 Peter 1
3 Alison NA
4 Warren NA
5 Jack 1
6 Jack 1
7 Jack 1
8 Jack 1
9 Susan 1
10 Susan 1
11 Henry 1
12 Walison NA
13 Tinder 1
14 Peter 1
15 Henry 1
16 Tinder 1
And I can't grasp why?
CodePudding user response:
We need if/else
here instead of ifelse
as ifelse
requires all arguments to be same length, sum
returns a single value and if it is TRUE
, then all becomes TRUE
library(dplyr)
df1 %>%
group_by(Fullname) %>%
mutate(newcol = row_number(),
newcol = if(sum(newcol)> 1) newcol else NA) %>%
ungroup
-output
# A tibble: 16 × 2
Fullname newcol
<chr> <int>
1 Peter 1
2 Peter 2
3 Alison NA
4 Warren NA
5 Jack 1
6 Jack 2
7 Jack 3
8 Jack 4
9 Susan 1
10 Susan 2
11 Henry 1
12 Walison NA
13 Tinder 1
14 Peter 3
15 Henry 2
16 Tinder 2
Now, we look at the issue. The 'newcol2' values are recycled values of single TRUE/FALSE. In the ifelse
, as all arguments need to be same length, the logical part is just of length 1.
df1 %>%
group_by(Fullname) %>%
mutate(newcol = row_number(), newcol2 = sum(newcol) > 1)
# A tibble: 16 × 3
# Groups: Fullname [8]
Fullname newcol newcol2
<chr> <int> <lgl>
1 Peter 1 TRUE
2 Peter 2 TRUE
3 Alison 1 FALSE
4 Warren 1 FALSE
5 Jack 1 TRUE
6 Jack 2 TRUE
7 Jack 3 TRUE
8 Jack 4 TRUE
9 Susan 1 TRUE
10 Susan 2 TRUE
11 Henry 1 TRUE
12 Walison 1 FALSE
13 Tinder 1 TRUE
14 Peter 3 TRUE
15 Henry 2 TRUE
16 Tinder 2 TRUE
The way to tackle is rep
licate to make the lengths same
df1 %>%
group_by(Fullname) %>%
mutate(newcol = seq_along(Fullname)) %>%
mutate(newcol = ifelse(rep(sum(newcol)>1, n()), newcol, NA))
# A tibble: 16 × 2
# Groups: Fullname [8]
Fullname newcol
<chr> <int>
1 Peter 1
2 Peter 2
3 Alison NA
4 Warren NA
5 Jack 1
6 Jack 2
7 Jack 3
8 Jack 4
9 Susan 1
10 Susan 2
11 Henry 1
12 Walison NA
13 Tinder 1
14 Peter 3
15 Henry 2
16 Tinder 2
In order to understand it better, just take a simple vector
> v1 <- c(1:5)
> sum(v1) > 4
[1] TRUE
> ifelse(sum(v1) > 4, v1, NA)
[1] 1
The sum
here is 15 and it is definitely greater than 4. As soon as the TRUE
is found, it just returns the first element of the vector i.e. 1 and stops. In the %>%
also, this is what is happening, but because there is recycling, the 1 gets repeated to fill the whole group