How to change values across 1 row based on values in a column in R?-CodePudding

I have a lot of columns in 1 dataframe that identify different timepoints of the same variable. Basically, within my data, if there's no response at timepoint X-1, there will be no response at time point X or beyond (after an NA appears in a row, it will continue). I currently have a column that shows which row the last response came from and what that response is. The dataframe currently looks like this:

  id X1 X2 X3 X4 X_final X_final_location
1  1  5  5  6 NA       6               X3
2  2  4 NA NA NA       4               X1
3  3  7  1  3  5       5               X4
4  4  8  2  4  2       2               X4
5  5  1  5 NA NA       5               X2
6  6  5  7  7  7       7               X4

My goal is to be able to conduct a regression using the last response of each row as the outcome variable. However, I don't want it to repeat twice in the "X_final" column and also in the column that the response actually comes from. Therefore, I am hoping to find a way to put a "." in for the cell where that value originally came from so it looks like this:

  id X1   X2 X3 X4 X_final X_final_location
1  1  5    5  6 NA       6               X3
2  2  . <NA> NA NA       4               X1
3  3  7    1  3  5       5               X4
4  4  8    2  4  2       2               X4
5  5  1    . NA NA       5               X2
6  6  5    7  7  7       7               X4

Any suggestions would be appreciated - thank you!

CodePudding user response：

One way to do this (NA instead of . to preserve data type): match finds the first NA position, replace replaces the value in that position - 1 (previous) with NA. apply(data, 1, \(x) ...) applies that function for each row. Finally t transposes the result (since apply by default coerces the result to columns.

data = data.frame(id = 1:6, X1 = c(5L, 4L, 7L, 8L, 1L, 5L), X2 = c(5L, 
NA, 1L, 2L, 5L, 7L), X3 = c(6L, NA, 3L, 4L, NA, 7L), X4 = c(NA, 
NA, 5L, 2L, NA, 7L), X_final = c(6L, 4L, 5L, 2L, 5L, 7L), X_final_location = c("X3", 
"X1", "X4", "X4", "X2", "X4"))

data[,2:5] <- t(apply(data[,2:5], 1 , function(x) replace(x, match(NA, x) - 1, NA)))

data
#>   id X1 X2 X3 X4 X_final X_final_location
#> 1  1  5  5 NA NA       6               X3
#> 2  2 NA NA NA NA       4               X1
#> 3  3  7  1  3  5       5               X4
#> 4  4  8  2  4  2       2               X4
#> 5  5  1 NA NA NA       5               X2
#> 6  6  5  7  7  7       7               X4

Another way using split (grouping by row):

split(data, row.names(data)) <- 
   lapply(split(data, row.names(data)), \(x) replace(x, x$X_final_location, "."))

CodePudding user response：

Another method, since you already have the locations in $X_final_location. As mentioned in the question comments, NA values would be preferred if the goal would be regression analysis to preserve numeric values.

data_orig <- data.frame(
    id = c(1, 2, 3, 4, 5, 6),
    X1 = c(5, 4, 7, 8, 1, 5),
    X2 = c(5, NA, 1, 2, 5, 7),
    X3 = c(6, NA, 3, 4, NA, 7),
    X4 = c(NA, NA, 5, 2, NA, 7),
    X_final = c(6, 4, 5, 2, 5, 7),
    X_final_location = c("X3", "X1", "X4", "X4", "X2", "X4")
)

data_new <- data_orig
for (i in seq_len(nrow(data_new))) {
    data_new[i, data_new$X_final_location[i]] <- NA
}

data_new

#   id X1 X2 X3 X4 X_final X_final_location
# 1  1  5  5 NA NA       6               X3
# 2  2 NA NA NA NA       4               X1
# 3  3  7  1  3 NA       5               X4
# 4  4  8  2  4 NA       2               X4
# 5  5  1 NA NA NA       5               X2
# 6  6  5  7  7 NA       7               X4