With R
I would like to fill a column of mostly NA values with ascending/descending numbers (integers), but relative to a row with value 0. The numbering should be grouped by Subject (1st Column). This sounds confusing, so please look at the example.
The data looks like this (Column_3 is current and Column_4 is the goal):
Subject | Visit | Column_3 | Column_4 |
---|---|---|---|
001 | Visit 1 | NA | -2 |
001 | Visit 2 | NA | -1 |
001 | Visit 3 | 0 | 0 |
001 | Visit 4 | NA | 1 |
001 | Visit 5 | NA | 2 |
002 | Visit 1 | NA | -1 |
002 | Visit 2 | 0 | 0 |
002 | Visit 3 | NA | 1 |
002 | Visit 4 | NA | 2 |
002 | Visit 5 | NA | 3 |
003 | Visit 1 | NA | -3 |
003 | Visit 2 | NA | -2 |
003 | Visit 3 | NA | -1 |
003 | Visit 4 | 0 | 0 |
003 | Visit 5 | NA | 1 |
So all subjects have a certain number of visits (1-5) and a certain visit indicated by the 0 (different for each subject) in Column 3. In Column 4 I would like to have the numbers in ascending order in reference to the visit indicated by the 0.
Currently I try to make a pipe with dplyr
:
df <- df %>% group_by(Subject) %>%
mutate(
column 4 = lag(Column_3, 1),
code here
) %>%
ungroup()
I'm trying to make use of fill()
and lag()
and lead()
, but I don't know how to use these efficiently in this case (unfortunately I have not found an "inefficient" approach either).
Suggestions are welcome!
CodePudding user response:
library(dplyr)
df |>
group_by(Subject) |>
mutate(Column_4 = row_number() - which(Column_3 == 0))
# A tibble: 15 × 4
# Groups: Subject [3]
Subject Visit Column_3 Column_4
<chr> <chr> <chr> <int>
1 001 Visit 1 NA -2
2 001 Visit 2 NA -1
3 001 Visit 3 0 0
4 001 Visit 4 NA 1
5 001 Visit 5 NA 2
6 002 Visit 1 NA -1
7 002 Visit 2 0 0
8 002 Visit 3 NA 1
9 002 Visit 4 NA 2
10 002 Visit 5 NA 3
11 003 Visit 1 NA -3
12 003 Visit 2 NA -2
13 003 Visit 3 NA -1
14 003 Visit 4 0 0
15 003 Visit 5 NA 1
CodePudding user response:
Slightly different option:
library(tidyverse)
df %>%
group_by(Subject) %>%
mutate(column_4 = row_number() - row_number()[!is.na(column_3 == 0)]) %>%
ungroup()
CodePudding user response:
Another possible solution:
library(dplyr)
df <- structure(list(Subject = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L), Visit = c("Visit 1", "Visit 2", "Visit 3",
"Visit 4", "Visit 5", "Visit 1", "Visit 2", "Visit 3", "Visit 4",
"Visit 5", "Visit 1", "Visit 2", "Visit 3", "Visit 4", "Visit 5"
), Column_3 = c(NA, NA, 0L, NA, NA, NA, 0L, NA, NA, NA, NA, NA,
NA, 0L, NA), Column_4 = c(-2L, -1L, 0L, 1L, 2L, -1L, 0L, 1L,
2L, 3L, -3L, -2L, -1L, 0L, 1L)), class = "data.frame", row.names = c(NA,
-15L))
df %>%
group_by(Subject) %>%
mutate(Column_5 = row_number() * (Column_3 == 0),
Column_5 = row_number() - max(Column_5, na.rm = T)) %>%
ungroup
#> # A tibble: 15 × 5
#> Subject Visit Column_3 Column_4 Column_5
#> <int> <chr> <int> <int> <int>
#> 1 1 Visit 1 NA -2 -2
#> 2 1 Visit 2 NA -1 -1
#> 3 1 Visit 3 0 0 0
#> 4 1 Visit 4 NA 1 1
#> 5 1 Visit 5 NA 2 2
#> 6 2 Visit 1 NA -1 -1
#> 7 2 Visit 2 0 0 0
#> 8 2 Visit 3 NA 1 1
#> 9 2 Visit 4 NA 2 2
#> 10 2 Visit 5 NA 3 3
#> 11 3 Visit 1 NA -3 -3
#> 12 3 Visit 2 NA -2 -2
#> 13 3 Visit 3 NA -1 -1
#> 14 3 Visit 4 0 0 0
#> 15 3 Visit 5 NA 1 1