Home > Software engineering >  Continuing a sequence into NAs using dplyr
Continuing a sequence into NAs using dplyr

Time:10-13

I am trying to figure out a dplyr specific way of continuing a sequence of numbers when there are NAs in that column.

For example I have this dataframe:

library(tibble)

dat <- tribble(
  ~x, ~group,
  1, "A",
  2, "A",
  NA_real_, "A",
  NA_real_, "A",
  1, "B",
  NA_real_, "B",
  3, "B"
)

dat
#> # A tibble: 7 × 2
#>       x group
#>   <dbl> <chr>
#> 1     1 A    
#> 2     2 A    
#> 3    NA A    
#> 4    NA A    
#> 5     1 B    
#> 6    NA B    
#> 7     3 B

I would like this one:

#> # A tibble: 7 × 2
#>       x group
#>   <dbl> <chr>
#> 1     1 A    
#> 2     2 A    
#> 3     3 A    
#> 4     4 A    
#> 5     1 B    
#> 6     2 B    
#> 7     3 B

When I try this I get a warning which makes me think I am probably approaching this incorrectly:

library(dplyr)

dat %>%
  group_by(group) %>%
  mutate(n = n()) %>%
  mutate(new_seq = seq_len(n))
#> Warning in seq_len(n): first element used of 'length.out' argument

#> Warning in seq_len(n): first element used of 'length.out' argument
#> # A tibble: 7 × 4
#> # Groups:   group [2]
#>       x group     n new_seq
#>   <dbl> <chr> <int>   <int>
#> 1     1 A         4       1
#> 2     2 A         4       2
#> 3    NA A         4       3
#> 4    NA A         4       4
#> 5     1 B         3       1
#> 6    NA B         3       2
#> 7     3 B         3       3

CodePudding user response:

It's easier if you do it in one go. Your approach is not 'wrong', it is just that seq_len needs one integer, and you are giving a vector (n), so seq_len corrects it by using the first value.

dat %>% 
  group_by(group) %>% 
  mutate(x = seq_len(n()))

Note that row_number might be even easier here:

dat %>% 
  group_by(group) %>% 
  mutate(x = row_number())

CodePudding user response:

We could use rowid directly if the intention is to create a sequence and group size is just intermediate column

library(data.table)
library(dplyr)
dat %>% 
   mutate(new_seq = rowid(group))

The issue with using a column after it is created is that it is no longer a single row as showed in @Maëls post. If we need to do that, use first as seq_len is not vectorized and here it is not needed as well

dat %>%
  group_by(group) %>%
  mutate(n = n()) %>%
  mutate(new_seq = seq_len(first(n)))
  • Related