Home > database >  Group_by id and count the consective NA's and then restart counting when a new series of NA
Group_by id and count the consective NA's and then restart counting when a new series of NA

Time:12-15

I have a dataframe like this:

df <- data_frame(id = c(rep('A', 10), rep('B', 10)),
                 value = c(1:3, rep(NA, 2), 1:2, rep(NA, 3), 1, rep(NA, 4), 1:3, rep(NA, 2)))

I need to count the number of consective NA's in the value column. The count needs to be grouped by ID, and it needs to restart at 1 every time a new NA or new series of NA's is encountered. The exptected output should look like this:

df$expected_output <- c(rep(NA, 3), 1:2, rep(NA, 2), 1:3, NA, 1:4, rep(NA, 3), 1:2)

If anyone can give me a dplyr solution that would also be great :)

I've tried a few things but nothing is giving any sort of sensical result. Thanks in advance^!

CodePudding user response:

A solution using dplyr and data.table.

library(dplyr)
library(data.table)

df2 <- df %>%
  group_by(id) %>%
  mutate(info = rleid(value)) %>%
  group_by(id, info) %>%
  mutate(expected_output = row_number()) %>%
  ungroup() %>%
  mutate(expected_output = ifelse(!is.na(value), NA, expected_output)) %>%
  select(-info)
df2
# # A tibble: 20 x 3  
#     id    value expected_output
#     <chr> <dbl>           <int>
#  1 A         1              NA
#  2 A         2              NA
#  3 A         3              NA
#  4 A        NA               1
#  5 A        NA               2
#  6 A         1              NA
#  7 A         2              NA
#  8 A        NA               1
#  9 A        NA               2
# 10 A        NA               3
# 11 B         1              NA
# 12 B        NA               1
# 13 B        NA               2
# 14 B        NA               3
# 15 B        NA               4
# 16 B         1              NA
# 17 B         2              NA
# 18 B         3              NA
# 19 B        NA               1
# 20 B        NA               2

CodePudding user response:

Here is a solution using rle:

x <- rle(is.na(df$value))
df$new[is.na(df$value)] <- sequence(x$lengths[x$values])

# A tibble: 20 x 3
   id    value   new
   <chr> <dbl> <int>
 1 A         1    NA
 2 A         2    NA
 3 A         3    NA
 4 A        NA     1
 5 A        NA     2
 6 A         1    NA
 7 A         2    NA
 8 A        NA     1
 9 A        NA     2
10 A        NA     3
11 B         1    NA
12 B        NA     1
13 B        NA     2
14 B        NA     3
15 B        NA     4
16 B         1    NA
17 B         2    NA
18 B         3    NA
19 B        NA     1
20 B        NA     2

CodePudding user response:

We can use rle to get length of groups that are or are not na, and use purrr::map2 to apply seq if they are NA and get the growing count or just fill in with NA values using rep.

library(tidyverse)

count_na <- function(x) {
  r <- rle(is.na(x))
  consec <- map2(r$lengths, r$values, ~ if (.y) seq(.x) else rep(NA, .x))
  unlist(consec)
}

df %>%
  mutate(expected_output = count_na(value))
#> # A tibble: 20 × 3
#>    id    value expected_output
#>    <chr> <dbl>           <int>
#>  1 A         1              NA
#>  2 A         2              NA
#>  3 A         3              NA
#>  4 A        NA               1
#>  5 A        NA               2
#>  6 A         1              NA
#>  7 A         2              NA
#>  8 A        NA               1
#>  9 A        NA               2
#> 10 A        NA               3
#> 11 B         1              NA
#> 12 B        NA               1
#> 13 B        NA               2
#> 14 B        NA               3
#> 15 B        NA               4
#> 16 B         1              NA
#> 17 B         2              NA
#> 18 B         3              NA
#> 19 B        NA               1
#> 20 B        NA               2
  • Related