Home > Blockchain >  how can i group_by NA's as well?
how can i group_by NA's as well?

Time:11-03

with this formula:

datanew <- df_bsp %>% 
  group_by(id_mother) %>%
  dplyr::mutate(Family = cur_group_id()) 

I got this output:

datanew <- data.frame(id_pers=c(1, 2, 3, 4, 5, 6),
                       id_mother=c(11, 11, 11, 12, 12, 12),
                       FAMILY=c(1,1,1,2,2,2)

now the problem:

There are also some NA's in the id_mother-variable

it looks like this:

datanew_1 <- data.frame(id_pers=c(1, 2, 3, 4, 5, 6, 7, 8, 9,10),
                           id_mother=c(11, 11, 11, 12, 12, 12, NA, NA, NA, NA)

How can i get this result:

datanew <- data.frame(id_pers=c(1, 2, 3, 4, 5, 6, 7, 8, 9,10),
                           id_mother=c(11, 11, 11, 12, 12, 12, NA, NA, NA, NA),
                           FAMILY=c(1,1,1,2,2,2,3,4,5,6)

THX

CodePudding user response:

If you want each NA value treated as its own group, give each one a unique value:

datanew_1 %>%
  mutate(
    id_mother_na = ifelse(
      is.na(id_mother), 
      paste("g", "na", cumsum(is.na(id_mother))),
      paste("g", id_mother)
    )
  ) %>%
  group_by(id_mother_na) %>%
  mutate(Family = cur_group_id()) %>%
  ungroup()
# # A tibble: 10 × 4
#    id_pers id_mother id_mother_na Family
#      <dbl>     <dbl> <chr>         <int>
#  1       1        11 g 11              1
#  2       2        11 g 11              1
#  3       3        11 g 11              1
#  4       4        12 g 12              2
#  5       5        12 g 12              2
#  6       6        12 g 12              2
#  7       7        NA g na 1            3
#  8       8        NA g na 2            4
#  9       9        NA g na 3            5
# 10      10        NA g na 4            6

CodePudding user response:

Along the same lines of the other answer, you need to make a unique group for the NA:

library(tidyverse)

make_grp <- function(x){
  coalesce(x, cumsum(is.na(x)))   (max(x, na.rm = TRUE)*is.na(x))
}

datanew_1 |>
  group_by(grp = make_grp(id_mother)) |>
  mutate(Family = cur_group_id())  |>
  ungroup() |>
  select(-grp)
#> # A tibble: 10 x 3
#>    id_pers id_mother Family
#>      <dbl>     <dbl>  <int>
#>  1       1        11      1
#>  2       2        11      1
#>  3       3        11      1
#>  4       4        12      2
#>  5       5        12      2
#>  6       6        12      2
#>  7       7        NA      3
#>  8       8        NA      4
#>  9       9        NA      5
#> 10      10        NA      6
  • Related