I would like to know an easy way, with dplyr, to count the number of the last NA values for each Id (only if the last value is NA)
my_data <- tibble(Id =c(1,1,1,2,2,2),date = rep(seq.Date(ymd(20210101),ymd(20210103),by =1 ),2), value = c(5, NA, 4, 4, NA, NA))
the expected result :
result <- tibble(Id = c(1,2), Result = c(0, 2))
I tried
my_data %>%
group_by(Id) %>%
mutate(test= ifelse(is.na(value), 1, 0)) %>%
filter(rle(test)$values[length(rle(test)$lengths)]!=0 ) %>%
summarise(result = rle(test)$lengths[length(rle(test)$lengths)])
CodePudding user response:
Try this:
library(dplyr)
my_data %>%
group_by(Id) %>%
summarise(lastNAcount = sum(is.na(last(value))))
Id lastNAcount
* <dbl> <int>
1 1 0
2 2 1
CodePudding user response:
A data table solution (probably not relevant for the OP but might be for someone else):
library(data.table)
setDT(my_data)
my_data[order(-date), .(Result = ifelse(is.na(value[1]), sum(rleid(value) == 1), 0L)), by = Id]
Id Result
1: 1 0
2: 2 2
CodePudding user response:
One dplyr
option could be:
my_data %>%
group_by(Id) %>%
summarise(Result = with(rle(is.na(value)), tail(values, 1) * tail(lengths, 1)))
Id Result
* <dbl> <int>
1 1 0
2 2 2
CodePudding user response:
Here's a solution with slice_tail
:
library(dplyr)
my_data %>%
group_by(Id) %>%
slice_tail() %>%
summarise(n = is.na(value))
# A tibble: 2 x 2
Id n
<dbl> <int>
1 1 0
2 2 1
CodePudding user response:
I am not sure I get your question totally right, because I would expect the result to be Result=c(0,1)
if,in my understanding, you want the information about the number of NA as last value for each ID. Which could only be 0 or 1 in my understanding. Otherwise, if you want all the NA per Id, the expected result would be Result=c(1,2)
. I cannot figure out which one would provide Result=c(0,2)
So there is this dplyr
option, that counts the number of NA per id:
my_data <- my_data %>% group_by(Id) %>% mutate(Results=cumsum(is.na(value)))
my_data
# A tibble: 6 x 4
# Groups: Id [2]
Id date value Results
<dbl> <date> <dbl> <int>
1 1 2021-01-01 5 0
2 1 2021-01-02 NA 1
3 1 2021-01-03 4 1
4 2 2021-01-01 4 0
5 2 2021-01-02 NA 1
6 2 2021-01-03 NA 2
If you only want the information on the last value :
my_data <- my_data %>% group_by(Id) %>% slice(n()) %>% mutate(Results=cumsum(is.na(value)))
my_data
# A tibble: 2 x 4
# Groups: Id [2]
Id date value Results
<dbl> <date> <dbl> <int>
1 1 2021-01-03 4 0
2 2 2021-01-03 NA 1
If you need Result=c(0,2)
, please clarify what you need and I will update my post.
CodePudding user response:
Grouped by "Id", get the cumulative sum of rev
erse logical vector and take the sum
of the 0 values
library(dplyr)
my_data %>%
group_by(Id) %>%
summarise(ind = sum(cumsum(rev(!is.na(value))) == 0))
# A tibble: 2 × 2
Id ind
<dbl> <int>
1 1 0
2 2 2