I am trying to figure out a dplyr specific way of continuing a sequence of numbers when there are NAs in that column.
For example I have this dataframe:
library(tibble)
dat <- tribble(
~x, ~group,
1, "A",
2, "A",
NA_real_, "A",
NA_real_, "A",
1, "B",
NA_real_, "B",
3, "B"
)
dat
#> # A tibble: 7 × 2
#> x group
#> <dbl> <chr>
#> 1 1 A
#> 2 2 A
#> 3 NA A
#> 4 NA A
#> 5 1 B
#> 6 NA B
#> 7 3 B
I would like this one:
#> # A tibble: 7 × 2
#> x group
#> <dbl> <chr>
#> 1 1 A
#> 2 2 A
#> 3 3 A
#> 4 4 A
#> 5 1 B
#> 6 2 B
#> 7 3 B
When I try this I get a warning which makes me think I am probably approaching this incorrectly:
library(dplyr)
dat %>%
group_by(group) %>%
mutate(n = n()) %>%
mutate(new_seq = seq_len(n))
#> Warning in seq_len(n): first element used of 'length.out' argument
#> Warning in seq_len(n): first element used of 'length.out' argument
#> # A tibble: 7 × 4
#> # Groups: group [2]
#> x group n new_seq
#> <dbl> <chr> <int> <int>
#> 1 1 A 4 1
#> 2 2 A 4 2
#> 3 NA A 4 3
#> 4 NA A 4 4
#> 5 1 B 3 1
#> 6 NA B 3 2
#> 7 3 B 3 3
CodePudding user response:
It's easier if you do it in one go. Your approach is not 'wrong', it is just that seq_len
needs one integer, and you are giving a vector (n
), so seq_len
corrects it by using the first value.
dat %>%
group_by(group) %>%
mutate(x = seq_len(n()))
Note that row_number
might be even easier here:
dat %>%
group_by(group) %>%
mutate(x = row_number())
CodePudding user response:
We could use rowid
directly if the intention is to create a sequence and group size is just intermediate column
library(data.table)
library(dplyr)
dat %>%
mutate(new_seq = rowid(group))
The issue with using a column after it is created is that it is no longer a single row as showed in @Maël
s post. If we need to do that, use first
as seq_len
is not vectorized and here it is not needed as well
dat %>%
group_by(group) %>%
mutate(n = n()) %>%
mutate(new_seq = seq_len(first(n)))