I am trying to create a new variable that prints the first value of a series of column, only if a certain condition is met.
To clarify, my database looks something like this:
var1 | var2 | var3 | var4 |
---|---|---|---|
C7931 | C3490 | R0781 | I10 |
R079 | R0600 | I10 | C3490 |
S270XXA | S225XXA | C3490 | C7931 |
I want to create a variable (main) that prints the value in the first var column only if the value does not start with C00 to C99. If the value does start with that condition, then I would like to test the condition the next column, until the condition is met, and the value is printed.
Therefore, the newly created variable (main) should look something like this for the table above:
var1 | var2 | var3 | var4 | main |
---|---|---|---|---|
C7931 | C3490 | R0781 | I10 | R0781 |
R079 | R0600 | I10 | C3490 | R079 |
C0258 | S225XXA | C3490 | C7931 | S225XXA |
I am not too sure where to start, but I suspect that maybe this might involve mutate() and ifelse()
CodePudding user response:
We could use grepl
to create a logical vector for subsetting by looping over each row. The pattern matched is C
followed by one or more digits (\\d
) and negate (!
) the logical vector to subset the elements, and return the first ([1]
)
df1$main <- apply(df1[startsWith(names(df1), "var")], 1,
function(x) x[!grepl("^C\\d ", x)][1])
With tidyverse
, can use rowwise
with str_subset
library(dplyr)
library(stringr)
df1 %>%
rowwise %>%
mutate(main = first(str_subset(c_across(starts_with("var")),
regex("^C\\d "), negate = TRUE))) %>%
ungroup
# A tibble: 3 × 5
var1 var2 var3 var4 main
<chr> <chr> <chr> <chr> <chr>
1 C7931 C3490 R0781 I10 R0781
2 R079 R0600 I10 C3490 R079
3 S270XXA S225XXA C3490 C7931 S270XXA
data
df1 <- structure(list(var1 = c("C7931", "R079", "S270XXA"), var2 = c("C3490",
"R0600", "S225XXA"), var3 = c("R0781", "I10", "C3490"), var4 = c("I10",
"C3490", "C7931")), class = "data.frame", row.names = c(NA, -3L
))
CodePudding user response:
This will create a column where all values that do mot meet the condition are stored: data used from akrun:
library(tidyverse)
df1 %>%
mutate(across(var1:var4, ~case_when(str_detect(., "^C\\d ") ~ "",
TRUE ~ .), .names = 'new_{col}')) %>%
unite(New_Col, starts_with('new'), na.rm = TRUE, sep = ' ')
var1 var2 var3 var4 New_Col
1 C7931 C3490 R0781 I10 R0781 I10
2 R079 R0600 I10 C3490 R079 R0600 I10
3 S270XXA S225XXA C3490 C7931 S270XXA S225XXA