Home > Software engineering >  Is there a way to fill in missing values of a column in between specific values?
Is there a way to fill in missing values of a column in between specific values?

Time:07-07

Let's say I create a data frame in R like this:

df1 <- data.frame(time = 1:20, trial = c(NA, NA, NA, 1, NA, NA, NA, 1, NA, NA, NA, 2, NA, NA, NA, 2, NA, NA, NA, NA))

which outputs:

   time trial
1     1    NA
2     2    NA
3     3    NA
4     4     1
5     5    NA
6     6    NA
7     7    NA
8     8     1
9     9    NA
10   10    NA
11   11    NA
12   12     2
13   13    NA
14   14    NA
15   15    NA
16   16    NA
17   17    NA
18   18     2
19   19    NA
20   20    NA

Is there a way I can make the NA's between the 1's also 1, the NA's between the 2's also 2, and the remaining NA's be 0? Essentially, I want the final data frame to look like this:

   time trial
1     1     0
2     2     0
3     3     0
4     4     1
5     5     1
6     6     1
7     7     1
8     8     1
9     9     0
10   10     0
11   11     0
12   12     2
13   13     2
14   14     2
15   15     2
16   16     2
17   17     2
18   18     2
19   19     0
20   20     0

CodePudding user response:

You can use this

for(i in 1:2){
    df1$trial[do.call( seq, as.list(which(df1$trial == i)))] <- i
}
df1$trial <- ifelse(is.na(df1$trial) , 0 , df1$trial)
  • Output
  time trial
1     1     0
2     2     0
3     3     0
4     4     1
5     5     1
6     6     1
7     7     1
8     8     1
9     9     0
10   10     0
11   11     0
12   12     2
13   13     2
14   14     2
15   15     2
16   16     2
17   17     2
18   18     2
19   19     0
20   20     0

CodePudding user response:

Here is a potential solution using dplyr and the runner package:

library(dplyr)
#install.packages("runner")
library(runner)

df1 <- data.frame(time = 1:26, trial = c(NA, NA, NA, 1, NA, NA, NA, 1, NA, NA, NA, 2, NA, NA, NA, 2, NA, NA, NA, NA, 3, NA, NA, NA, 3, NA))

# To fill between every non-NA 'run'
df1 %>%
  mutate(trial = runner::fill_run(trial, only_within = TRUE))
#>    time trial
#> 1     1    NA
#> 2     2    NA
#> 3     3    NA
#> 4     4     1
#> 5     5     1
#> 6     6     1
#> 7     7     1
#> 8     8     1
#> 9     9    NA
#> 10   10    NA
#> 11   11    NA
#> 12   12     2
#> 13   13     2
#> 14   14     2
#> 15   15     2
#> 16   16     2
#> 17   17    NA
#> 18   18    NA
#> 19   19    NA
#> 20   20    NA
#> 21   21     3
#> 22   22     3
#> 23   23     3
#> 24   24     3
#> 25   25     3
#> 26   26    NA

# To fill between 1's and 2's and leave everything else as is
df1 %>%
  mutate(tmp = runner::fill_run(trial, only_within = TRUE)) %>%
  transmute(time, trial = ifelse(tmp >= 3 & is.na(trial), NA, tmp))
#>    time trial
#> 1     1    NA
#> 2     2    NA
#> 3     3    NA
#> 4     4     1
#> 5     5     1
#> 6     6     1
#> 7     7     1
#> 8     8     1
#> 9     9    NA
#> 10   10    NA
#> 11   11    NA
#> 12   12     2
#> 13   13     2
#> 14   14     2
#> 15   15     2
#> 16   16     2
#> 17   17    NA
#> 18   18    NA
#> 19   19    NA
#> 20   20    NA
#> 21   21     3
#> 22   22    NA
#> 23   23    NA
#> 24   24    NA
#> 25   25     3
#> 26   26    NA

Created on 2022-07-07 by the reprex package (v2.0.1)

  • Related