Find the first non-missing value in a set of variables for each unique person and then change all NA-CodePudding

I have a wide dataset with 1 row per person. Each person has a set of 12 variables representing a visit for each month of the year. If a person did not have a visit for a month, it currently says NA. If a person had multiple visits within a month, the number is the number of visits. For example, a patient was seen for the first time in February, and they had 2 visits. The January column is NA because they were not seen then and the February column is 2 because they had 2 visits in that month. The rest of the months for that person would be filled in accordingly if they had visits or not.

My goal is to find the first non-missing value and then any months after their first visit with no visits needs to be changed to 0. I still want to keep months before their first visit as NA. I have 'have' and 'want' data sets provided.

I have made attempts at loops, if statements, and functions, but am not having any success. Does anybody have experience with this type of code?

HAVE:

|ID | Jan | Feb | Mar | Apr | May | June | Jul |
|:--|:---:|:---:|:---:|:---:|:---:|:----:|:---:|
| 1 |  NA |  1  |  1  | 2   | NA  | NA   | NA  |
| 2 |  NA | NA  | NA  | 1   | NA  |  NA  |  NA |
| 3 |  1  | NA  | 1   | 3   | 1   | NA   | NA  |  
| 4 |  NA | NA  | NA  | NA  | NA  | 1    | 1   |

WANT:

|ID | Jan | Feb | Mar | Apr | May | June | Jul |
|:--|:---:|:---:|:---:|:---:|:---:|:----:|:---:|
| 1 |  NA |  1  |  1  | 2   |  0  | 0    | 0   |
| 2 |  NA | NA  | NA  | 1   | 0  |  0  |  0 |
| 3 |  1  | 0  | 1   | 3   | 1   | 0   | 0  |  
| 4 |  NA | NA  | NA  | NA  | NA  | 1    | 1   |

CodePudding user response：

We may use a vectorized option with max.col to find the column index of the first non-NA element excluding the first column for each row. Then, create a logical matrix by checking whether this replicated index is less than the colum index and (&) if the values are NA to return a logical matrix which we use to subset the data and assign (<-) to 0

df1[-1][col(df1[-1]) > max.col(!is.na(df1[-1]), 'first')[
         row(df1[-1])] & is.na(df1[-1])] <- 0

-output

> df1
  ID Jan Feb Mar Apr May June Jul
1  1  NA   1   1   2   0    0   0
2  2  NA  NA  NA   1   0    0   0
3  3   1   0   1   3   1    0   0
4  4  NA  NA  NA  NA  NA    1   1

Or using apply

df1[] <-  t(apply(df1[-1], 1, FUN = function(x) 
    replace(x, seq_along(x) > which(!is.na(x))[1] & is.na(x), 0)))

data

df1 <- structure(list(ID = 1:4, Jan = c(NA, NA, 1L, NA), Feb = c(1L, 
NA, NA, NA), Mar = c(1L, NA, 1L, NA), Apr = c(2L, 1L, 3L, NA), 
    May = c(NA, NA, 1L, NA), June = c(NA, NA, NA, 1L), Jul = c(NA, 
    NA, NA, 1L)), class = "data.frame", row.names = c(NA, -4L
))

CodePudding user response：

A tidyverse solution using pivoting, grouping and filling:

library(tidyverse)

df1 <- df1 %>% 
  pivot_longer(-ID) %>% 
  group_by(ID) %>% 
  mutate(temp = 1 * !cumall(is.na(value))) %>%
  fill(temp) %>% 
  mutate(value = ifelse(temp == 1 & is.na(value), 0, value))  %>% 
  pivot_wider(-temp, names_from = name, values_from = value)