Home > Back-end >  replace with missing columns according to the substring of the colname using dplyr
replace with missing columns according to the substring of the colname using dplyr

Time:02-04

I'd like to replace with NA certain columns depending on the start_value and the end_value. For example:

  • the first row will stay like this because start_value and end_value are missing;
  • the second row will be replaced with missing, starting from the first column (value.1) to the third (value.3);
  • the third row will be replaced with missing, starting from the second column (value.2) to the third (value.3), etc.

Example dataframe:

df <- data.frame(id=c(1:4),value.1=c(0,1,1,0), value.2=c(rep(0,3),1), value.3=c(1,1,1,0), start_value=c(NA,1,2,NA),end_value=c(NA,3,3,NA))
id value.1  value.2  value.3  start_value  end_value
1     0        0       1         NA           NA
2     1        0       1         1            3
3     1        0       1         2            3
4     0        1       0         NA           NA

I'd like to obtain a final df like this:

id value.1  value.2  value.3  
    1     0        0       1      
    2     NA       NA      NA       
    3     1        NA      NA       
    4     0        1       0

CodePudding user response:

Here's a way using apply:

as.data.frame(
  t(apply(df, 1, function(x) {
  start <- x[5]
  end  <- x[6]
  if (anyNA(c(start, end))){
    x[1:4]
  } else {
    x[2:4][start:end] <- NA
    x[1:4]
  }
})))

  id value.1 value.2 value.3
1  1       0       0       1
2  2      NA      NA      NA
3  3       1      NA      NA
4  4       0       1       0

CodePudding user response:

A base R approach with row/column indexing

i1 <- complete.cases(df[5:6])
lst1 <- do.call(Map, c(f = `:`, unname(df[i1,5:6])))
df[i1, 2:4][cbind(rep(seq_along(lst1), lengths(lst1)), unlist(lst1))] <- NA

-output

> df[1:4]
  id value.1 value.2 value.3
1  1       0       0       1
2  2      NA      NA      NA
3  3       1      NA      NA
4  4       0       1       0

Or using tidyverse

library(dplyr)
library(stringr)
nm1 <- str_subset(names(df), "value\\.")
 df %>%
   transmute(id, across(starts_with('value'), 
   ~ {
   ind <- match(cur_column(), nm1)
   replace(.x, ind >= start_value & ind <=end_value, NA)
}))

-output

  id value.1 value.2 value.3
1  1       0       0       1
2  2      NA      NA      NA
3  3       1      NA      NA
4  4       0       1       0
  • Related