Home > OS >  keep only `Groups` where at least 2 element in a column are present within a list in R
keep only `Groups` where at least 2 element in a column are present within a list in R

Time:07-03

I have a list such as :

The_list=c('SP1','SP2','SP3')

And I have a dataframe such as

Names Groups 
SP1   G1
SP2   G1
SP3   G1
SP1   G2
SP4   G3
SP5   G4
SP2   G5
SP3   G5
SP6   G5 
SP2   G6
SP7   G6 

And I would like to keep only Groups where at least 2 element in Names are present within The_list;

Here I should get:

Names Groups 
SP1   G1
SP2   G1
SP3   G1
SP2   G5
SP3   G5
SP6   G5 

Here is the df if it can helps

structure(list(Names = c("SP1", "SP2", "SP3", "SP1", "SP4", "SP5", 
"SP2", "SP3", "SP6", "SP2", "SP7"), Groups = c("G1", "G1", "G1", 
"G2", "G3", "G4", "G5", "G5", "G5", "G6", "G6")), class = "data.frame", row.names = c(NA, 
-11L))

CodePudding user response:

Using data.table

library(data.table)
setDT(df1)[df1[, .I[sum(The_list %in% Names) >=2], by = Groups]$V1]

-output

    Names Groups
   <char> <char>
1:    SP1     G1
2:    SP2     G1
3:    SP3     G1
4:    SP2     G5
5:    SP3     G5
6:    SP6     G5

CodePudding user response:

One solution you can use is

df |> 
  group_by(Groups) |> 
  filter(sum(Names %in% The_list) >= 2)

Correction... because I'm using Names %in% The_list it does not uniquely identify each Name, which may cause some groups to be displayed because duplicate names.

df |> 
  group_by(Groups) |> 
  filter(sum(The_list %in% Names) >= 2)
  Names Groups
  <chr> <chr> 
1 SP1   G1    
2 SP2   G1    
3 SP3   G1    
4 SP2   G5    
5 SP3   G5    
6 SP6   G5  
  • Related