I am trying to extract rows containing paired site/location (i.e., one column containing "before" and another column containing "after" an event).
Here's an example data.
dat <- data.frame(var1 = c('Green Island_one_before', 'Green Island_two_before', 'Green Island_one_after', 'Pink Island_one_before', 'Pink Island_two_after', 'Pink Island_five_after'),
var2 = c('Green Island_three_before', 'Green Island_two_after', 'Green Island_one_after', 'Pink Island_four_before', 'Pink Island_two_before', 'Pink Island_six_after'),
value = c(1, 7, 4, 2, 8, 3))
Using the example data above, it should return rows #2 and 5.
CodePudding user response:
I would start by putting each variable in a column. That is, separate var1
and var2
in their component parts, the site, location and time. After that, anything one needs to do will become much more straightforward. E.g.
library(tidyverse)
dat %>%
separate(var1, into = c('site_v1', 'location_v1', 'time_v1'), sep = '_') %>%
separate(var2, into = c('site_v2', 'location_v2', 'time_v2'), sep = '_') %>%
filter(site_v1 == site_v2, location_v1 == location_v2, time_v1 != time_v2)
site_v1 location_v1 time_v1 site_v2 location_v2 time_v2 value 1 Green Island two before Green Island two after 7 2 Pink Island two after Pink Island two before 8
CodePudding user response:
Assuming all the other parts of the string are the same, you could do:
library(tidyverse)
dat %>%
filter(str_detect(pmax(var1, var2), 'before'),
str_detect(pmin(var1, var2), 'after'))
var1 var2 value
1 site1_loc2_before site1_loc2_after 7
2 site3_loc2_after site3_loc2_before 8
CodePudding user response:
A possible solution.
EXPLANATION (asked by the OP)
See: Regex demo.
library(tidyverse)
dat %>%
filter(str_extract(var1, "[^_]*$") != str_extract(var2, "[^_]*$"))
#> var1 var2 value
#> 1 Green Island_two_before Green Island_two_after 7
#> 2 Pink Island_two_after Pink Island_two_before 8