Home > Software design >  subsetting a dataframe in R with two steps
subsetting a dataframe in R with two steps

Time:10-05

I have the following dataframe:

df
   name  direction to   
   <chr> <fct>     <chr>
 1 A     ->        B    
 2 A     ->        X    
 3 B     ->        X    
 4 B     ->        Y    
 5 C     ->        B    
 6 C     ->        Y    
 7 S     ->        T    
 8 T     ->        C    
 9 W     ->        Y    
10 X     ->        W    
11 Y     NA        NA  

Step 1. I first want to subset the dataframe to only include values that either have X or Y in the columns name and to.

df %>% dplyr::select(name,direction,to) %>% filter(name %in% c('X','Y') | to %in% c('X','Y'))

  name  direction to   
  <chr> <fct>     <chr>
1 A     ->        X    
2 B     ->        X    
3 B     ->        Y    
4 C     ->        Y    
5 W     ->        Y    
6 X     ->        W    
7 Y     NA        NA  

Step 2. From there, I want to get any other connections that match with any of the unique values in name from df in Step 1. For example, the unique values in name are A,B,C,W,X,Y after Step 1. I want to get all observations in the original dataset (without filtering) where any of these values are in the name column from the original dataset df. In this example, observations 1 (C->B) and 5 (A->B) from the original dataframe would be added to the subset.

Expected output:

  name  direction to   
  <chr> <fct>     <chr>
1 A     ->        X    
2 A     ->        B
3 B     ->        X    
4 B     ->        Y 
5 C     ->        B   
6 C     ->        Y    
7 W     ->        Y    
8 X     ->        W    
9 Y     NA        NA  

Let me know if this doesn't make sense.

CodePudding user response:

I think this should work

df %>% dplyr::select(name,direction,to) %>% filter(name %in% c('X','Y') | to %in% c('X','Y')) -> dfTmp
df[df$name %in% (dfTmp$name),]

CodePudding user response:

We can use if_any to loop over the 'name', 'to' to return a logical vector, subset the 'name' and create a logical vector with %in%

library(dplyr)
df %>% 
   filter(name %in% name[if_any(c(name, to), ~ . %in% c('X', 'Y' ))])%>%
   as_tibble

-output

# A tibble: 9 × 3
  name  direction to   
  <chr> <chr>     <chr>
1 A     ->        B    
2 A     ->        X    
3 B     ->        X    
4 B     ->        Y    
5 C     ->        B    
6 C     ->        Y    
7 W     ->        Y    
8 X     ->        W    
9 Y     <NA>      <NA>

data

df <- structure(list(name = c("A", "A", "B", "B", "C", "C", "S", "T", 
"W", "X", "Y"), direction = c("->", "->", "->", "->", "->", "->", 
"->", "->", "->", "->", NA), to = c("B", "X", "X", "Y", "B", 
"Y", "T", "C", "Y", "W", NA)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11"))
  •  Tags:  
  • r
  • Related