I have a wide dataset with 1 row per person. Each person has a set of 12 variables representing a visit for each month of the year. If a person did not have a visit for a month, it currently says NA. If a person had multiple visits within a month, the number is the number of visits. For example, a patient was seen for the first time in February, and they had 2 visits. The January column is NA because they were not seen then and the February column is 2 because they had 2 visits in that month. The rest of the months for that person would be filled in accordingly if they had visits or not.
My goal is to find the first non-missing value and then any months after their first visit with no visits needs to be changed to 0. I still want to keep months before their first visit as NA. I have 'have' and 'want' data sets provided.
I have made attempts at loops, if statements, and functions, but am not having any success. Does anybody have experience with this type of code?
HAVE:
|ID | Jan | Feb | Mar | Apr | May | June | Jul |
|:--|:---:|:---:|:---:|:---:|:---:|:----:|:---:|
| 1 | NA | 1 | 1 | 2 | NA | NA | NA |
| 2 | NA | NA | NA | 1 | NA | NA | NA |
| 3 | 1 | NA | 1 | 3 | 1 | NA | NA |
| 4 | NA | NA | NA | NA | NA | 1 | 1 |
WANT:
|ID | Jan | Feb | Mar | Apr | May | June | Jul |
|:--|:---:|:---:|:---:|:---:|:---:|:----:|:---:|
| 1 | NA | 1 | 1 | 2 | 0 | 0 | 0 |
| 2 | NA | NA | NA | 1 | 0 | 0 | 0 |
| 3 | 1 | 0 | 1 | 3 | 1 | 0 | 0 |
| 4 | NA | NA | NA | NA | NA | 1 | 1 |
CodePudding user response:
We may use a vectorized option with max.col
to find the column index of the first non-NA element excluding the first column for each row. Then, create a logical matrix by checking whether this rep
licated index is less than the col
um index and (&
) if the values are NA
to return a logical matrix
which we use to subset the data and assign (<-
) to 0
df1[-1][col(df1[-1]) > max.col(!is.na(df1[-1]), 'first')[
row(df1[-1])] & is.na(df1[-1])] <- 0
-output
> df1
ID Jan Feb Mar Apr May June Jul
1 1 NA 1 1 2 0 0 0
2 2 NA NA NA 1 0 0 0
3 3 1 0 1 3 1 0 0
4 4 NA NA NA NA NA 1 1
Or using apply
df1[] <- t(apply(df1[-1], 1, FUN = function(x)
replace(x, seq_along(x) > which(!is.na(x))[1] & is.na(x), 0)))
data
df1 <- structure(list(ID = 1:4, Jan = c(NA, NA, 1L, NA), Feb = c(1L,
NA, NA, NA), Mar = c(1L, NA, 1L, NA), Apr = c(2L, 1L, 3L, NA),
May = c(NA, NA, 1L, NA), June = c(NA, NA, NA, 1L), Jul = c(NA,
NA, NA, 1L)), class = "data.frame", row.names = c(NA, -4L
))
CodePudding user response:
A tidyverse
solution using pivoting, grouping and filling:
library(tidyverse)
df1 <- df1 %>%
pivot_longer(-ID) %>%
group_by(ID) %>%
mutate(temp = 1 * !cumall(is.na(value))) %>%
fill(temp) %>%
mutate(value = ifelse(temp == 1 & is.na(value), 0, value)) %>%
pivot_wider(-temp, names_from = name, values_from = value)