I have columns of counts labelled with different sample type. To begin the comparison I would like to to subset particular group of samples.
In order to do that I was trying this
df2 <- df1[,!grepl("pH|B|M|C|G|L", colnames(df1))]
My objective is to keep the samples which starts with H and TCGA
How do to that I understand when im running the above lines it is also removing TCGA labelled comlumn since it contains Letters which are also present in TCGA.
I tried the other way
df2 <- df1[,grepl("H|TCGA", colnames(df1))]
here the issue is since I have samples which are labelled as pH are also getting selected.
How do to resolve it.
Any help or suggestion would be really helpful
names(df1)
[1] "H1" "H2" "H3" "H4" "B11" "B12" "B1" "B2" "B3"
[10] "B4" "B5" "B6" "B7" "B8" "B9" "C1" "C2" "C3"
[19] "C4" "G1" "G2" "G3" "G4" "L1" "L2" "L3" "L4"
[28] "L5" "L6" "L7" "L8" "M1" "M2" "M3" "M4" "pH10"
[37] "pH11" "pH12" "pH1" "pH2" "pH3" "pH4" "pH5" "pH6" "pH7"
[46] "pH8" "pH9" "TCGA-AB-2856" "TCGA-AB-2849" "TCGA-AB-2971"
CodePudding user response:
Use ^
to match the beginning of a string, as in
df2 <- df1[,grepl("^H|^TCGA", colnames(df1))]
We can also use dplyr with starts_with()
:
library(dplyr)
df1 %>%
select(starts_with(c('H', 'TCGA'))