Home > other >  Filter rows by paired sites in R using limited matching characters
Filter rows by paired sites in R using limited matching characters

Time:06-17

I am trying to extract rows containing paired site/location (i.e., one column containing "before" and another column containing "after" an event).

Here's an example data.

dat <- data.frame(var1 = c('Green Island_one_before', 'Green Island_two_before', 'Green Island_one_after', 'Pink Island_one_before', 'Pink Island_two_after', 'Pink Island_five_after'),
                 var2 = c('Green Island_three_before', 'Green Island_two_after', 'Green Island_one_after', 'Pink Island_four_before', 'Pink Island_two_before', 'Pink Island_six_after'),
                 value = c(1, 7, 4, 2, 8, 3))

Using the example data above, it should return rows #2 and 5.

CodePudding user response:

I would start by putting each variable in a column. That is, separate var1 and var2 in their component parts, the site, location and time. After that, anything one needs to do will become much more straightforward. E.g.

library(tidyverse)

dat %>% 
  separate(var1, into = c('site_v1', 'location_v1', 'time_v1'), sep = '_') %>% 
  separate(var2, into = c('site_v2', 'location_v2', 'time_v2'), sep = '_') %>% 
  filter(site_v1 == site_v2, location_v1 == location_v2, time_v1 != time_v2)
       site_v1 location_v1 time_v1      site_v2 location_v2 time_v2 value
1 Green Island         two  before Green Island         two   after     7
2  Pink Island         two   after  Pink Island         two  before     8

CodePudding user response:

Assuming all the other parts of the string are the same, you could do:

library(tidyverse)

dat %>%
  filter(str_detect(pmax(var1, var2), 'before'),
         str_detect(pmin(var1, var2), 'after'))

               var1              var2 value
1 site1_loc2_before  site1_loc2_after     7
2  site3_loc2_after site3_loc2_before     8

CodePudding user response:

A possible solution.

EXPLANATION (asked by the OP)

See: Regex demo.

library(tidyverse)

dat %>% 
  filter(str_extract(var1, "[^_]*$") != str_extract(var2, "[^_]*$"))

#>                      var1                   var2 value
#> 1 Green Island_two_before Green Island_two_after     7
#> 2   Pink Island_two_after Pink Island_two_before     8
  •  Tags:  
  • r
  • Related