Home > database >  R: How to simply count last NA
R: How to simply count last NA

Time:10-01

I would like to know an easy way, with dplyr, to count the number of the last NA values for each Id (only if the last value is NA)

 my_data <- tibble(Id =c(1,1,1,2,2,2),date = rep(seq.Date(ymd(20210101),ymd(20210103),by =1 ),2), value = c(5, NA, 4, 4, NA, NA))

the expected result :

 result <- tibble(Id = c(1,2), Result = c(0, 2))

I tried

 my_data %>% 
     group_by(Id) %>% 
     mutate(test= ifelse(is.na(value), 1, 0)) %>% 
     filter(rle(test)$values[length(rle(test)$lengths)]!=0 ) %>% 
     summarise(result = rle(test)$lengths[length(rle(test)$lengths)])

CodePudding user response:

Try this:

library(dplyr)
my_data %>% 
  group_by(Id) %>% 
  summarise(lastNAcount = sum(is.na(last(value))))
     Id lastNAcount
* <dbl>       <int>
1     1           0
2     2           1

CodePudding user response:

A data table solution (probably not relevant for the OP but might be for someone else):

library(data.table)
setDT(my_data)
my_data[order(-date), .(Result = ifelse(is.na(value[1]), sum(rleid(value) == 1), 0L)), by = Id]

   Id Result
1:  1      0
2:  2      2

CodePudding user response:

One dplyr option could be:

my_data %>%
    group_by(Id) %>%
    summarise(Result = with(rle(is.na(value)), tail(values, 1) * tail(lengths, 1)))

     Id Result
* <dbl>  <int>
1     1      0
2     2      2

CodePudding user response:

Here's a solution with slice_tail:

library(dplyr)
my_data %>%
  group_by(Id) %>%
  slice_tail() %>%
  summarise(n =  is.na(value))
# A tibble: 2 x 2
     Id     n
  <dbl> <int>
1     1     0
2     2     1

CodePudding user response:

I am not sure I get your question totally right, because I would expect the result to be Result=c(0,1) if,in my understanding, you want the information about the number of NA as last value for each ID. Which could only be 0 or 1 in my understanding. Otherwise, if you want all the NA per Id, the expected result would be Result=c(1,2). I cannot figure out which one would provide Result=c(0,2)

So there is this dplyr option, that counts the number of NA per id:

my_data <- my_data %>% group_by(Id) %>% mutate(Results=cumsum(is.na(value)))
my_data
# A tibble: 6 x 4
# Groups:   Id [2]
     Id date       value Results
  <dbl> <date>     <dbl>   <int>
1     1 2021-01-01     5       0
2     1 2021-01-02    NA       1
3     1 2021-01-03     4       1
4     2 2021-01-01     4       0
5     2 2021-01-02    NA       1
6     2 2021-01-03    NA       2

If you only want the information on the last value :

my_data <- my_data %>% group_by(Id) %>% slice(n()) %>% mutate(Results=cumsum(is.na(value)))
    my_data
# A tibble: 2 x 4
# Groups:   Id [2]
     Id date       value Results
  <dbl> <date>     <dbl>   <int>
1     1 2021-01-03     4       0
2     2 2021-01-03    NA       1

If you need Result=c(0,2), please clarify what you need and I will update my post.

CodePudding user response:

Grouped by "Id", get the cumulative sum of reverse logical vector and take the sum of the 0 values

library(dplyr)
my_data %>% 
  group_by(Id) %>%
  summarise(ind = sum(cumsum(rev(!is.na(value))) == 0))
# A tibble: 2 × 2
     Id   ind
  <dbl> <int>
1     1     0
2     2     2
  • Related