Home > Software design >  Filter R dataframe rows by matching value in consecutive columns
Filter R dataframe rows by matching value in consecutive columns

Time:11-19

I have a large dataframe of true (1) and false (0) information. Below is a reduced sample of it.

df <- read.table(text = "  0m-10m 0m-15m 0m-20m 0m-25m
                 X 1 0 1 1 
                 Y 1 1 0 0 
                 Z 0 1 0 1 ", header = T)

I need to subset rows where two consecutive contrasts are true (both == 1) so that the result looks like this

   0m-10m 0m-15m 0m-20m 0m-25m
X  1 0 1 1
Y  1 1 0 0 

I can filter by rowSums >=2, but this is also true for row Z, which I need to exclude. Thoughts?

CodePudding user response:

Just remove the first and last columns, create a logical matrix with & and then use rowSums to create logical vector to subset

df[rowSums(df[-1] & df[-ncol(df)]) > 0,]

-output

  0m-10m 0m-15m 0m-20m 0m-25m
X      1      0      1      1
Y      1      1      0      0

The rowSums above is not based on a single data. We are creating a logical matrix from two equal sized datasets by removing the first and last columns and then using & so, that if both have 1 in the same location, only it will return TRUE and FALSE otherwise. rowSums on this matrix returns the sum of TRUE (or TRUE/FALSE -> 1/0)

> df[-1] & df[-ncol(df)]
  0m-15m 0m-20m 0m-25m
X  FALSE  FALSE   TRUE
Y   TRUE  FALSE  FALSE
Z  FALSE  FALSE  FALSE
> rowSums(df[-1] & df[-ncol(df)])
X Y Z 
1 1 0 
> rowSums(df[-1] & df[-ncol(df)]) > 0
    X     Y     Z 
 TRUE  TRUE FALSE 

Or if we are looking for a general case, we may use rle - run-length-encoding on each row by looping over the rows with apply and MARGIN = 1. The rle returns a list of values and lengths for each adjacent similar values. Then, we create a logical vector based on the lengths and values i.e. if the 'values' is 1 and 'lengths' is 2.

n <- 2
df[apply(df, 1, FUN = function(x) with(rle(x), any(lengths == n & values))),]
  0m-10m 0m-15m 0m-20m 0m-25m
X      1      0      1      1
Y      1      1      0      0

-code breakup

> apply(df, 1, FUN = rle)
$X
Run Length Encoding
  lengths: Named int [1:3] 1 1 2
 - attr(*, "names")= chr [1:3] "0m-15m" "0m-20m" ""
  values : Named int [1:3] 1 0 1
 - attr(*, "names")= chr [1:3] "0m-10m" "0m-15m" "0m-25m"

$Y
Run Length Encoding
  lengths: Named int [1:2] 2 2
 - attr(*, "names")= chr [1:2] "0m-20m" ""
  values : Named int [1:2] 1 0
 - attr(*, "names")= chr [1:2] "0m-15m" "0m-25m"

$Z
Run Length Encoding
  lengths: Named int [1:4] 1 1 1 1
 - attr(*, "names")= chr [1:4] "0m-15m" "0m-20m" "0m-25m" ""
  values : Named int [1:4] 0 1 0 1
 - attr(*, "names")= chr [1:4] "0m-10m" "0m-15m" "0m-20m" "0m-25m"
> apply(df, 1, FUN = function(x) with(rle(x),lengths == n & values))
$X
0m-15m 0m-20m        
 FALSE  FALSE   TRUE 

$Y
0m-20m        
  TRUE  FALSE 

$Z
0m-15m 0m-20m 0m-25m        
 FALSE  FALSE  FALSE  FALSE 
> apply(df, 1, FUN = function(x) with(rle(x), any(lengths == n & values)))
    X     Y     Z 
 TRUE  TRUE FALSE 

CodePudding user response:

Here is an alternative way using pivot:

library(dplyr)
library(tidyr)

df %>% 
  rownames_to_column("xyz") %>% 
  pivot_longer(
    -xyz
  ) %>% 
  group_by(xyz) %>% 
  mutate(helper = lag(value),
         flag = ifelse(value==1 & helper==1, 1,0)) %>% 
  filter(any(flag==1)) %>% 
  pivot_wider(
    names_from = name,
    values_from = value,
    values_fill = 0
  ) %>% 
  summarize(across(starts_with("X"), sum)) %>% 
  column_to_rownames("xyz")

  X0m.10m X0m.15m X0m.20m X0m.25m
X       1       0       1       1
Y       1       1       0       0

CodePudding user response:

A solution based on the creation of an auxiliary column concatenating all original columns as a string (using tidyr::unite) and then using stringr::str_detect on the strings:

library(tidyverse)

df <- read.table(text = "  0m-10m 0m-15m 0m-20m 0m-25m
                 X 1 0 1 1 
                 Y 1 1 0 0 
                 Z 0 1 0 1 ", header = T)

df %>% 
  unite(aux, sep = "", remove = F) %>% 
  filter(str_detect(aux, "11")) %>%
  select(-aux)

#>   X0m.10m X0m.15m X0m.20m X0m.25m
#> X       1       0       1       1
#> Y       1       1       0       0
  • Related