I have a dataframe with many columns that give compound values for sample IDs. I'm looking to remove any columns that only appear in one sample ID, therefore keeping columns in which compounds are present in at least 2 sample IDs. Here is a mock dataframe:
df1 <- data.frame(ID = c("A","B","C","D","E","F","G","H","I"),
Cmpd_1 = c(5.7,0,0,0,2.5,2.1,0,6.2,1.5),
Cmpd_2 = c(0,0,1,0,2.8,0,0,0,0),
Cmpd_3 = c(0,0,3.5,0,0,0,0,0,0))
In this example, Cmpd_3 only appears for sample C and I would therefore like the entire column to be removed. Here's what the ideal output would be:
ID Cmpd_1 Cmpd_2
A 5.7 0.0
B 0.0 0.0
C 0.0 1.0
D 0.0 0.0
E 2.5 2.8
F 2.1 0.0
G 0.0 0.0
H 6.2 0.0
I 1.5 0.0
CodePudding user response:
This should work if IDs are unique :
df2 <- df1[,colSums(df1!=0)>1]
CodePudding user response:
Using dplyr
library(dplyr)
df1 %>%
select(where(~ sum(.x != 0, na.rm = TRUE) > 1))