R: Fill NA values with ascending numbers-CodePudding

With R I would like to fill a column of mostly NA values with ascending/descending numbers (integers), but relative to a row with value 0. The numbering should be grouped by Subject (1st Column). This sounds confusing, so please look at the example. The data looks like this (Column_3 is current and Column_4 is the goal):

Subject	Visit	Column_3	Column_4
001	Visit 1	NA	-2
001	Visit 2	NA	-1
001	Visit 3	0	0
001	Visit 4	NA	1
001	Visit 5	NA	2
002	Visit 1	NA	-1
002	Visit 2	0	0
002	Visit 3	NA	1
002	Visit 4	NA	2
002	Visit 5	NA	3
003	Visit 1	NA	-3
003	Visit 2	NA	-2
003	Visit 3	NA	-1
003	Visit 4	0	0
003	Visit 5	NA	1

So all subjects have a certain number of visits (1-5) and a certain visit indicated by the 0 (different for each subject) in Column 3. In Column 4 I would like to have the numbers in ascending order in reference to the visit indicated by the 0.

Currently I try to make a pipe with dplyr:

df <- df %>% group_by(Subject) %>%
     mutate(
    column 4 = lag(Column_3, 1),
    code here
    ) %>%
    ungroup()

I'm trying to make use of fill() and lag() and lead(), but I don't know how to use these efficiently in this case (unfortunately I have not found an "inefficient" approach either).

Suggestions are welcome!

CodePudding user response：

library(dplyr)

df |>
  group_by(Subject) |>
  mutate(Column_4 = row_number() - which(Column_3 == 0))

  # A tibble: 15 × 4
# Groups:   Subject [3]
   Subject Visit   Column_3 Column_4
   <chr>   <chr>   <chr>       <int>
 1 001     Visit 1 NA             -2
 2 001     Visit 2 NA             -1
 3 001     Visit 3 0               0
 4 001     Visit 4 NA              1
 5 001     Visit 5 NA              2
 6 002     Visit 1 NA             -1
 7 002     Visit 2 0               0
 8 002     Visit 3 NA              1
 9 002     Visit 4 NA              2
10 002     Visit 5 NA              3
11 003     Visit 1 NA             -3
12 003     Visit 2 NA             -2
13 003     Visit 3 NA             -1
14 003     Visit 4 0               0
15 003     Visit 5 NA              1

CodePudding user response：

Slightly different option:

library(tidyverse)
df %>%
  group_by(Subject) %>%
  mutate(column_4 = row_number() - row_number()[!is.na(column_3 == 0)]) %>%
  ungroup()

CodePudding user response：

Another possible solution:

library(dplyr) 

df <- structure(list(Subject = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 3L, 3L, 3L, 3L, 3L), Visit = c("Visit 1", "Visit 2", "Visit 3", 
"Visit 4", "Visit 5", "Visit 1", "Visit 2", "Visit 3", "Visit 4", 
"Visit 5", "Visit 1", "Visit 2", "Visit 3", "Visit 4", "Visit 5"
), Column_3 = c(NA, NA, 0L, NA, NA, NA, 0L, NA, NA, NA, NA, NA, 
NA, 0L, NA), Column_4 = c(-2L, -1L, 0L, 1L, 2L, -1L, 0L, 1L, 
2L, 3L, -3L, -2L, -1L, 0L, 1L)), class = "data.frame", row.names = c(NA, 
-15L))

df %>% 
  group_by(Subject) %>% 
  mutate(Column_5 = row_number() * (Column_3 == 0), 
         Column_5 = row_number() - max(Column_5, na.rm = T)) %>% 
  ungroup

#> # A tibble: 15 × 5
#>    Subject Visit   Column_3 Column_4 Column_5
#>      <int> <chr>      <int>    <int>    <int>
#>  1       1 Visit 1       NA       -2       -2
#>  2       1 Visit 2       NA       -1       -1
#>  3       1 Visit 3        0        0        0
#>  4       1 Visit 4       NA        1        1
#>  5       1 Visit 5       NA        2        2
#>  6       2 Visit 1       NA       -1       -1
#>  7       2 Visit 2        0        0        0
#>  8       2 Visit 3       NA        1        1
#>  9       2 Visit 4       NA        2        2
#> 10       2 Visit 5       NA        3        3
#> 11       3 Visit 1       NA       -3       -3
#> 12       3 Visit 2       NA       -2       -2
#> 13       3 Visit 3       NA       -1       -1
#> 14       3 Visit 4        0        0        0
#> 15       3 Visit 5       NA        1        1