Home > Net >  Create a new variable that prints the first value in a series of column only if the condition is met
Create a new variable that prints the first value in a series of column only if the condition is met

Time:03-27

I am trying to create a new variable that prints the first value of a series of column, only if a certain condition is met.

To clarify, my database looks something like this:

var1 var2 var3 var4
C7931 C3490 R0781 I10
R079 R0600 I10 C3490
S270XXA S225XXA C3490 C7931

I want to create a variable (main) that prints the value in the first var column only if the value does not start with C00 to C99. If the value does start with that condition, then I would like to test the condition the next column, until the condition is met, and the value is printed.

Therefore, the newly created variable (main) should look something like this for the table above:

var1 var2 var3 var4 main
C7931 C3490 R0781 I10 R0781
R079 R0600 I10 C3490 R079
C0258 S225XXA C3490 C7931 S225XXA

I am not too sure where to start, but I suspect that maybe this might involve mutate() and ifelse()

CodePudding user response:

We could use grepl to create a logical vector for subsetting by looping over each row. The pattern matched is C followed by one or more digits (\\d ) and negate (!) the logical vector to subset the elements, and return the first ([1])

df1$main <- apply(df1[startsWith(names(df1), "var")], 1, 
       function(x) x[!grepl("^C\\d ", x)][1])

With tidyverse, can use rowwise with str_subset

library(dplyr)
library(stringr)
df1 %>% 
 rowwise %>% 
 mutate(main = first(str_subset(c_across(starts_with("var")), 
       regex("^C\\d "), negate = TRUE))) %>%
 ungroup
# A tibble: 3 × 5
  var1    var2    var3  var4  main   
  <chr>   <chr>   <chr> <chr> <chr>  
1 C7931   C3490   R0781 I10   R0781  
2 R079    R0600   I10   C3490 R079   
3 S270XXA S225XXA C3490 C7931 S270XXA

data

df1 <- structure(list(var1 = c("C7931", "R079", "S270XXA"), var2 = c("C3490", 
"R0600", "S225XXA"), var3 = c("R0781", "I10", "C3490"), var4 = c("I10", 
"C3490", "C7931")), class = "data.frame", row.names = c(NA, -3L
))

CodePudding user response:

This will create a column where all values that do mot meet the condition are stored: data used from akrun:

library(tidyverse)

df1 %>% 
  mutate(across(var1:var4, ~case_when(str_detect(., "^C\\d ") ~ "",
                                      TRUE ~ .), .names = 'new_{col}')) %>%
  unite(New_Col, starts_with('new'), na.rm = TRUE, sep = ' ')
    var1    var2  var3  var4           New_Col
1   C7931   C3490 R0781   I10         R0781 I10
2    R079   R0600   I10 C3490   R079 R0600 I10 
3 S270XXA S225XXA C3490 C7931 S270XXA S225XXA  
  • Related