Home > Software engineering >  Replacing values with conditional matching from different columns
Replacing values with conditional matching from different columns

Time:09-22

I have a data structure like the one below.

df <- structure(list(id = c(1, 1, 1, 2, 2, 3, 3, 4, 4), HT = c("L", 
"L", "L", "W", "W", "B", "B", "WA", "WA"), HS = c(4, 4, 4, 0, 
0, 3, 3, 0, 0), AT = c("N", "N", "N", "M", "M", "S", "S", "BR", 
"BR"), AS = c(1, 1, 1, 5, 5, 1, 1, 3, 3), T = c("L", "L", "N", 
"W", "M", "B", "S", "WA", "BR"), H_A = c("H", "H", "A", "H", 
"A", "H", "A", "H", "A"), S_incorrect_value = c(3, 3, 1, 0, 5, 
3, 0, 0, 2), S_corrected_value = c(4, 4, 1, 0, 5, 3, 1, 0, 3)), row.names = c(NA, 
-9L), class = c("tbl_df", "tbl", "data.frame"))

However, it contains some inconsistent values that need to be replaced with a new column to be created (S_corrected_value).

The column with the desired values will be in long format, and correspond to values from two other columns. There are also different numbers of rows that make up each case (id). I have left marked in color to be noticed where the data is scraped from.

enter image description here

CodePudding user response:

You can use dplyr::case_when() to evaluate the value of H_A and determine what value to replace with.

library(tidyverse)

df %>% 
  mutate(S_corrected_value_new = case_when(
    H_A == "H" ~ HS,
    H_A == "A" ~ AS
  ))
#> # A tibble: 9 × 10
#>      id HT       HS AT       AS T     H_A   S_incorrect_value S_correc…¹ S_cor…²
#>   <dbl> <chr> <dbl> <chr> <dbl> <chr> <chr>             <dbl>      <dbl>   <dbl>
#> 1     1 L         4 N         1 L     H                     3          4       4
#> 2     1 L         4 N         1 L     H                     3          4       4
#> 3     1 L         4 N         1 N     A                     1          1       1
#> 4     2 W         0 M         5 W     H                     0          0       0
#> 5     2 W         0 M         5 M     A                     5          5       5
#> 6     3 B         3 S         1 B     H                     3          3       3
#> 7     3 B         3 S         1 S     A                     0          1       1
#> 8     4 WA        0 BR        3 WA    H                     0          0       0
#> 9     4 WA        0 BR        3 BR    A                     2          3       3
#> # … with abbreviated variable names ¹​S_corrected_value, ²​S_corrected_value_new

Created on 2022-09-21 by the reprex package (v2.0.1)

  • Related