Home > Back-end >  Removing values prior to some specific index rowwise
Removing values prior to some specific index rowwise

Time:04-16

I currently have data which looks like the following:

ID Var_1 Var_2 Var_3 Var_4 Var_5 RemovePrior
1 20 30 25 35 40 3
2 40 50 45 55 60 2
3 60 70 65 75 80 4
4 80 90 85 95 85 5
df <- data.frame(ID = c(1, 2, 3, 4),
                 Var_1 = c(20, 40, 60, 80),
                 Var_2 = c(30, 50, 70, 90),
                 Var_3 = c(25, 45, 65, 85),
                 Var_4 = c(35, 55, 75, 95),
                 Var_5 = c(40, 60, 80, 85),
                 RemovePrior = c(3, 2, 4, 5))

My goal is to—for each row—fill in the variables to the left of the variable with the suffix indicated by "RemovePrior" with NA. For example, for ID = 1, RemovePrior takes on the value 3, hence I wish to have Var_1 and Var_2 set to NA. The following is the intended result:

ID Var_1 Var_2 Var_3 Var_4 Var_5 RemovePrior
1 NA NA 25 35 40 3
2 NA 50 45 55 60 2
3 NA NA NA 75 80 4
4 NA NA NA NA 85 5

CodePudding user response:

We can use base R

nm1 <- startsWith(names(df), "Var")
df[nm1] <- NA^(col(df[nm1]) < df$RemovePrior) * df[nm1]

-output

> df
  ID Var_1 Var_2 Var_3 Var_4 Var_5 RemovePrior
1  1    NA    NA    25    35    40           3
2  2    NA    50    45    55    60           2
3  3    NA    NA    NA    75    80           4
4  4    NA    NA    NA    NA    85           5

Or with dplyr

library(dplyr)
df %>% 
   mutate(across(starts_with("Var_"), 
   ~ case_when(readr::parse_number(cur_column()) >= RemovePrior ~ .x )))
  ID Var_1 Var_2 Var_3 Var_4 Var_5 RemovePrior
1  1    NA    NA    25    35    40           3
2  2    NA    50    45    55    60           2
3  3    NA    NA    NA    75    80           4
4  4    NA    NA    NA    NA    85           5

CodePudding user response:

A possible solution:

library(tidyverse)

df %>% 
  mutate(across(Var_1:Var_5,
    ~ if_else(which(cur_column() == names(df[-1])) < RemovePrior, NA_real_, .x)))

#>   ID Var_1 Var_2 Var_3 Var_4 Var_5 RemovePrior
#> 1  1    NA    NA    25    35    40           3
#> 2  2    NA    50    45    55    60           2
#> 3  3    NA    NA    NA    75    80           4
#> 4  4    NA    NA    NA    NA    85           5

CodePudding user response:

Here is another way how we could do it: Using pivot_longer and pivot_wider

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(
    -c(ID, RemovePrior)
  ) %>% 
  group_by(ID) %>% 
  mutate(value = ifelse(value == lag(value, unique(RemovePrior)-1), NA, value)) %>% 
  pivot_wider(
    names_from = name,
    values_from = value
  )

output:

     ID RemovePrior Var_1 Var_2 Var_3 Var_4 Var_5
  <dbl>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1           3    NA    NA    25    35    40
2     2           2    NA    50    45    55    60
3     3           4    NA    NA    NA    75    80
4     4           5    NA    NA    NA    NA    85
  • Related