Home > OS >  Create a new data frame cleaning NA and updating by column in r
Create a new data frame cleaning NA and updating by column in r

Time:05-31

Imagine we have this data which comes from a task that represents different episodes and different things happen inside each of them. I want to make a vector out of the following logics, I would like to organize the information per trials t only when this is available.

t <- c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,2,2)
p <- c(NA,NA,NA,8,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,2,NA,NA,NA,NA,NA,NA,NA,NA)
p_1 <- c(NA,NA,NA,NA,NA,NA,NA,11,NA,13,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,8,NA,NA,NA,NA)
b <- c(NA,NA,NA,NA,NA,7,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,10,NA,NA,NA,NA,NA,NA)
rw <- c(NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,-1)
a <- c(NA,NA,1,NA,NA,1,NA,1,NA,1,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,1,NA,0,NA,NA,NA,NA)
is <- c(NA,NA,NA,FALSE,NA,NA,NA,TRUE,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,TRUE,NA,NA,NA,NA,NA,NA,NA,NA)
v <- data.frame(t,p,p_1,b,rw,a,is)

#which returns the following data frame:

   t  p p_1  b rw  a    is
1  1 NA  NA NA NA NA    NA
2  1 NA  NA NA NA NA    NA
3  1 NA  NA NA NA  1    NA
4  1  8  NA NA NA NA FALSE
5  1 NA  NA NA NA NA    NA
6  1 NA  NA  7 NA  1    NA
7  1 NA  NA NA NA NA    NA
8  1 NA   1 NA NA  1  TRUE
9  1 NA  NA NA NA NA    NA
10 1 NA  13 NA NA  1    NA
11 1 NA  NA NA NA NA    NA
12 1 NA  NA NA NA NA    NA
13 1 NA  NA NA NA NA    NA
14 1 NA  NA NA NA NA    NA
15 1 NA  NA NA -1 NA    NA
16 2 NA  NA NA NA NA    NA
17 2 NA  NA NA NA NA    NA
18 2 NA  NA NA NA NA    NA
19 2  1  NA NA NA NA  TRUE
20 2 NA  NA NA NA NA    NA
21 2 NA  NA 10 NA  1    NA
22 2 NA  NA NA NA NA    NA
23 2 NA   8 NA NA  0    NA
24 2 NA  NA NA NA NA    NA
25 2 NA  NA NA NA NA    NA
26 2 NA  NA NA NA NA    NA
27 2 NA  NA NA -1 NA    NA

My end aim is to read the table and produce a data frame or a vector with the updated data at every step until the end of the episode, something that would read like this.

1 8 NA NA NA 1 FALSE
1 8 NA 7  NA 1 FALSE
1 8  1 7  NA 1 TRUE
1 8 13 7  NA 1 TRUE
1 8 13 7  -1 1 TRUE
2 1 NA NA NA 1 TRUE
2 1 NA 10 NA 1 TRUE
2 1  8 10 NA 0 TRUE
2 1  8 10 -1 0 TRUE

I tried to make a vector without NA and then try to join them but as rows are not the same size it does not work.

I also tried, based on this solution and dplyr but I couldn't make it work


library(dplyr)

test1 <- v %>% filter(if_all(contains('t','p','p_1','b','rw','a','is'), Negate(is.na)))


> test1 <- v %>% filter(if_all(contains('t','p','p_1','b','rw','a','is'), Negate(is.na)))
Error: Problem with `filter()` input `..1`.
ℹ Input `..1` is `if_all(contains("t", "p", "p_1", "b", "rw", "a", "is"), Negate(is.na))`.
x unused arguments ("b", "rw", "a", "is")
Run `rlang::last_error()` to see where the error occurred.

Any ideas?

Thanks in advance!

CodePudding user response:

Try this using dplyr and tidyr::fill.

library(dplyr)
library(tidyr)

v |> 
  filter(rowSums(!is.na(v))>1) |>
  group_by(t) |>
  fill(everything())

# A tibble: 10 x 7
# Groups:   t [2]
       t     p   p_1     b    rw     a is   
   <int> <int> <int> <int> <int> <int> <lgl>
 1     1    NA    NA    NA    NA     1 NA   
 2     1     8    NA    NA    NA     1 FALSE
 3     1     8    NA     7    NA     1 FALSE
 4     1     8     1     7    NA     1 TRUE 
 5     1     8    13     7    NA     1 TRUE 
 6     1     8    13     7    -1     1 TRUE 
 7     2     1    NA    NA    NA    NA TRUE 
 8     2     1    NA    10    NA     1 TRUE 
 9     2     1     8    10    NA     0 TRUE 
10     2     1     8    10    -1     0 TRUE 

CodePudding user response:

This is effectively the same approach as George Savva's, but he got to it just before I did. It uses zoo::na.locf instead of tidyr. I will provide it as an alternative, but I wouldn't claim it is any better.

library(dplyr)

v %>% 
  # Remove any rows where all of the values are NA
  filter(!(is.na(p) & is.na(p_1) & 
             is.na(b) & is.na(rw) & 
             is.na(a) & is.na(is))) %>%
  # Group by ID
  group_by(t) %>%
  # Perform a "last one carried forward", which will replace any missing
  # values in a column with the last non-missing value, if available.
  mutate_at(c("p", "p_1", "b", "rw", "a", "is"), 
            zoo::na.locf, 
            na.rm = FALSE) %>% 
  ungroup() %>% 
  # Remove rows that have a missing `p`
  filter(!is.na(p))
#> # A tibble: 9 x 7
#>       t     p   p_1     b    rw     a is   
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
#> 1     1     8    NA    NA    NA     1 FALSE
#> 2     1     8    NA     7    NA     1 FALSE
#> 3     1     8     1     7    NA     1 TRUE 
#> 4     1     8    13     7    NA     1 TRUE 
#> 5     1     8    13     7    -1     1 TRUE 
#> 6     2     2    NA    NA    NA    NA TRUE 
#> 7     2     2    NA    10    NA     1 TRUE 
#> 8     2     2     8    10    NA     0 TRUE 
#> 9     2     2     8    10    -1     0 TRUE
  • Related