I am trying to sort a row based on the first letters of the cells and i am having hard time to write a code in r.
snp | allele1 | allele2 |
---|---|---|
mmsoop | A | A |
rs3122 | C | G |
SNP1234 | T | C |
rs3144 | A | A |
The above is the example dataset to show how my dataset looks like and i want to subset the whole table based on snp row where the snp column starts with "rs" and "SNP"
Expected table:
snp | allele1 | allele2 |
---|---|---|
rs3122 | C | G |
SNP1234 | T | C |
rs3144 | A | A |
Any help is appreciated!!
CodePudding user response:
Alternatively,
df<- read.table(
text= "
snp allele1 allele2
mmsoop A A
rs3122 C G
SNP1234 T C
rs3144 A A",
header=T
)
df[grep("^(SNP|rs)",df$snp),]
snp allele1 allele2
2 rs3122 C G
3 SNP1234 T C
4 rs3144 A A
CodePudding user response:
We may use grepl
in subset
to create a logical vector by matching the rs
or (|
) SNP
from the start (^
) of the string to subset the rows
subset(df1, grepl("^(rs|SNP)", snp))
snp allele1 allele2
2 rs3122 C G
3 SNP1234 T C
4 rs3144 A A
data
df1 <- structure(list(snp = c("mmsoop", "rs3122", "SNP1234", "rs3144"
), allele1 = c("A", "C", "T", "A"), allele2 = c("A", "G", "C",
"A")), class = "data.frame", row.names = c(NA, -4L))
CodePudding user response:
We could combine filter
with str_detect
:
library(dplyr)
library(stringr)
df %>%
filter(str_detect(snp, 'rs|SNP'))
snp allele1 allele2
1 rs3122 C G
2 SNP1234 T C
3 rs3144 A A