I have a data full of strings like this
df<- "PFSSQQRPHRHSMYVTRDKVRAKGLDGSLSIGQGMAARANSLQLLSPQPGEQLPPEMTVA"
I want to split the letters 5 counts before S and 5 letters after each S
so the output looks like this
5 count before S 5 counts after S
PF SQQRP
PFS QRPHR
RPHRH MYVTR
KGLDG LSIGQ
LDGSL IGQGM
AARAN LQLLS
SLQLL PQPGE
CodePudding user response:
Try this:
fun <- function(S, bef=5, aft=bef) {
wh <- which(strsplit(S, "")[[1]] == "S")
Sbef <- substring(S, wh - bef, wh - 1)
Saft <- substring(S, wh 1, wh aft)
data.frame(bef = Sbef, aft = Saft)
}
fun(df)
# bef aft
# 1 PF SQQRP
# 2 PFS QQRPH
# 3 RPHRH MYVTR
# 4 KGLDG LSIGQ
# 5 LDGSL IGQGM
# 6 AARAN LQLLS
# 7 SLQLL PQPGE
Note that strings without any instance of "S"
will return 0 rows. If you instead want it to return the whole string as bef
(and empty string in aft
), we need a simple conditional:
fun <- function(S, bef=5, aft=bef) {
wh <- which(strsplit(S, "")[[1]] == "S")
if (!length(wh)) wh <- nchar(S) 1
Sbef <- substring(S, wh - bef, wh - 1)
Saft <- substring(S, wh 1, wh aft)
data.frame(bef = Sbef, aft = Saft)
}
fun("hello world")
# bef aft
# 1 world
Edit: thanks for @DarrenTsai's comment, we can use substring
in a vectorized fashion, removing the need for mapply
.
CodePudding user response:
Please try the below code
df<- "PFSSQQRPHRHSMYVTRDKVRAKGLDGSLSIGQGMAARANSLQLLSPQPGEQLPPEMTVA"
df3 <- data.frame(pos=unlist(gregexpr('S', df)), string="PFSSQQRPHRHSMYVTRDKVRAKGLDGSLSIGQGMAARANSLQLLSPQPGEQLPPEMTVA")
df3 %>% mutate(string2=str_sub(str_sub(string,1,pos-1),-5,-1), string3=str_sub(str_sub(string,pos 1),1,5))