I have a dataframe that looks like the following. I'm trying to combine two columns (a_on_off and b_on_off) into a single column (on_off), based on their ID. When ID is not equal to a1, a2, or b1, b2, I would like the on_off column to be empty (NA).
a_on_off | b_on_off | ID | Date |
---|---|---|---|
off | on | b1 | 2017-08-09 |
off | on | a2 | 2017-08-09 |
off | on | a1 | 2017-08-10 |
off | on | a1 | 2017-08-11 |
on | on | x1 | 2017-08-12 |
on | on | x2 | 2017-08-13 |
on | on | y1 | 2017-08-13 |
off | off | b1 | 2017-08-13 |
off | off | a2 | 2017-08-14 |
off | off | a2 | 2017-08-15 |
on | on | b2 | 2017-08-15 |
on | on | y1 | 2017-08-15 |
on | on | x1 | 2017-08-15 |
on | on | y3 | 2017-08-16 |
The code I have is almost there, but it not matching the correct row value of a_on_off and b_on_off to the ID number. Instead it is filling in the a_on_off and b_on_off values in order. How do I specify that I want the values to come from the same row?
df$on_off[df$ID == "a1" | df$ID == "a2)"] <- df$a_on_off
df$on_off[df$ID == "b1" | df$ID == "b2)"] <- df$b_on_off
Any help would be appreciated. Thank you!
CodePudding user response:
We can detect the 'ID' where there are 'a' substring followed by some digits (\\d
) with str_detect
and return the values of 'a_on_off', similarly do this for the 'b' substring in 'ID'. By default, the remaining will return NA
in case_when
library(dplyr)
library(stringr)
df <- df %>%
mutate(on_off = case_when(str_detect(ID, '^a\\d ') ~ a_on_off,
str_detect(ID, '^b\\d ') ~ b_on_off))
-output
df
a_on_off b_on_off ID Date on_off
1 off on b1 2017-08-09 on
2 off on a2 2017-08-09 off
3 off on a1 2017-08-10 off
4 off on a1 2017-08-11 off
5 on on x1 2017-08-12 <NA>
6 on on x2 2017-08-13 <NA>
7 on on y1 2017-08-13 <NA>
8 off off b1 2017-08-13 off
9 off off a2 2017-08-14 off
10 off off a2 2017-08-15 off
11 on on b2 2017-08-15 on
12 on on y1 2017-08-15 <NA>
13 on on x1 2017-08-15 <NA>
14 on on y3 2017-08-16 <NA>
data
df <- structure(list(a_on_off = c("off", "off", "off", "off", "on",
"on", "on", "off", "off", "off", "on", "on", "on", "on"), b_on_off = c("on",
"on", "on", "on", "on", "on", "on", "off", "off", "off", "on",
"on", "on", "on"), ID = c("b1", "a2", "a1", "a1", "x1", "x2",
"y1", "b1", "a2", "a2", "b2", "y1", "x1", "y3"), Date = c("2017-08-09",
"2017-08-09", "2017-08-10", "2017-08-11", "2017-08-12", "2017-08-13",
"2017-08-13", "2017-08-13", "2017-08-14", "2017-08-15", "2017-08-15",
"2017-08-15", "2017-08-15", "2017-08-16")), class = "data.frame",
row.names = c(NA,
-14L))