Imagine we have this data which comes from a task that represents different episodes and different things happen inside each of them. I want to make a vector out of the following logics, I would like to organize the information per trials t
only when this is available.
t <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2)
p <- c(NA,NA,NA,8,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,2,NA,NA,NA,NA,NA,NA,NA,NA)
p_1 <- c(NA,NA,NA,NA,NA,NA,NA,11,NA,13,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,8,NA,NA,NA,NA)
b <- c(NA,NA,NA,NA,NA,7,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,10,NA,NA,NA,NA,NA,NA)
rw <- c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1)
a <- c(NA,NA,1,NA,NA,1,NA,1,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,0,NA,NA,NA,NA)
is <- c(NA,NA,NA,FALSE,NA,NA,NA,TRUE,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,TRUE,NA,NA,NA,NA,NA,NA,NA,NA)
v <- data.frame(t,p,p_1,b,rw,a,is)
#which returns the following data frame:
t p p_1 b rw a is
1 1 NA NA NA NA NA NA
2 1 NA NA NA NA NA NA
3 1 NA NA NA NA 1 NA
4 1 8 NA NA NA NA FALSE
5 1 NA NA NA NA NA NA
6 1 NA NA 7 NA 1 NA
7 1 NA NA NA NA NA NA
8 1 NA 1 NA NA 1 TRUE
9 1 NA NA NA NA NA NA
10 1 NA 13 NA NA 1 NA
11 1 NA NA NA NA NA NA
12 1 NA NA NA NA NA NA
13 1 NA NA NA NA NA NA
14 1 NA NA NA NA NA NA
15 1 NA NA NA -1 NA NA
16 2 NA NA NA NA NA NA
17 2 NA NA NA NA NA NA
18 2 NA NA NA NA NA NA
19 2 1 NA NA NA NA TRUE
20 2 NA NA NA NA NA NA
21 2 NA NA 10 NA 1 NA
22 2 NA NA NA NA NA NA
23 2 NA 8 NA NA 0 NA
24 2 NA NA NA NA NA NA
25 2 NA NA NA NA NA NA
26 2 NA NA NA NA NA NA
27 2 NA NA NA -1 NA NA
My end aim is to read the table and produce a data frame or a vector with the updated data at every step until the end of the episode, something that would read like this.
1 8 NA NA NA 1 FALSE
1 8 NA 7 NA 1 FALSE
1 8 1 7 NA 1 TRUE
1 8 13 7 NA 1 TRUE
1 8 13 7 -1 1 TRUE
2 1 NA NA NA 1 TRUE
2 1 NA 10 NA 1 TRUE
2 1 8 10 NA 0 TRUE
2 1 8 10 -1 0 TRUE
I tried to make a vector without NA and then try to join them but as rows are not the same size it does not work.
I also tried, based on this solution and dplyr
but I couldn't make it work
library(dplyr)
test1 <- v %>% filter(if_all(contains('t','p','p_1','b','rw','a','is'), Negate(is.na)))
> test1 <- v %>% filter(if_all(contains('t','p','p_1','b','rw','a','is'), Negate(is.na)))
Error: Problem with `filter()` input `..1`.
ℹ Input `..1` is `if_all(contains("t", "p", "p_1", "b", "rw", "a", "is"), Negate(is.na))`.
x unused arguments ("b", "rw", "a", "is")
Run `rlang::last_error()` to see where the error occurred.
Any ideas?
Thanks in advance!
CodePudding user response:
Try this using dplyr
and tidyr::fill
.
library(dplyr)
library(tidyr)
v |>
filter(rowSums(!is.na(v))>1) |>
group_by(t) |>
fill(everything())
# A tibble: 10 x 7
# Groups: t [2]
t p p_1 b rw a is
<int> <int> <int> <int> <int> <int> <lgl>
1 1 NA NA NA NA 1 NA
2 1 8 NA NA NA 1 FALSE
3 1 8 NA 7 NA 1 FALSE
4 1 8 1 7 NA 1 TRUE
5 1 8 13 7 NA 1 TRUE
6 1 8 13 7 -1 1 TRUE
7 2 1 NA NA NA NA TRUE
8 2 1 NA 10 NA 1 TRUE
9 2 1 8 10 NA 0 TRUE
10 2 1 8 10 -1 0 TRUE
CodePudding user response:
This is effectively the same approach as George Savva's, but he got to it just before I did. It uses zoo::na.locf
instead of tidyr. I will provide it as an alternative, but I wouldn't claim it is any better.
library(dplyr)
v %>%
# Remove any rows where all of the values are NA
filter(!(is.na(p) & is.na(p_1) &
is.na(b) & is.na(rw) &
is.na(a) & is.na(is))) %>%
# Group by ID
group_by(t) %>%
# Perform a "last one carried forward", which will replace any missing
# values in a column with the last non-missing value, if available.
mutate_at(c("p", "p_1", "b", "rw", "a", "is"),
zoo::na.locf,
na.rm = FALSE) %>%
ungroup() %>%
# Remove rows that have a missing `p`
filter(!is.na(p))
#> # A tibble: 9 x 7
#> t p p_1 b rw a is
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
#> 1 1 8 NA NA NA 1 FALSE
#> 2 1 8 NA 7 NA 1 FALSE
#> 3 1 8 1 7 NA 1 TRUE
#> 4 1 8 13 7 NA 1 TRUE
#> 5 1 8 13 7 -1 1 TRUE
#> 6 2 2 NA NA NA NA TRUE
#> 7 2 2 NA 10 NA 1 TRUE
#> 8 2 2 8 10 NA 0 TRUE
#> 9 2 2 8 10 -1 0 TRUE