how to split a data into as many files as possible-CodePudding

I have a data full of strings like this

df<- "PFSSQQRPHRHSMYVTRDKVRAKGLDGSLSIGQGMAARANSLQLLSPQPGEQLPPEMTVA"

I want to split the letters 5 counts before S and 5 letters after each S

so the output looks like this

5 count before S    5 counts after S
   PF               SQQRP
  PFS               QRPHR
RPHRH               MYVTR
KGLDG               LSIGQ
LDGSL               IGQGM
AARAN               LQLLS
SLQLL               PQPGE

CodePudding user response：

Try this:

fun <- function(S, bef=5, aft=bef) {
  wh <- which(strsplit(S, "")[[1]] == "S")
  Sbef <- substring(S, wh - bef, wh - 1)
  Saft <- substring(S, wh   1, wh   aft)
  data.frame(bef = Sbef, aft = Saft)
}
fun(df)
#     bef   aft
# 1    PF SQQRP
# 2   PFS QQRPH
# 3 RPHRH MYVTR
# 4 KGLDG LSIGQ
# 5 LDGSL IGQGM
# 6 AARAN LQLLS
# 7 SLQLL PQPGE

Note that strings without any instance of "S" will return 0 rows. If you instead want it to return the whole string as bef (and empty string in aft), we need a simple conditional:

fun <- function(S, bef=5, aft=bef) {
  wh <- which(strsplit(S, "")[[1]] == "S")
  if (!length(wh)) wh <- nchar(S)   1
  Sbef <- substring(S, wh - bef, wh - 1)
  Saft <- substring(S, wh   1, wh   aft)
  data.frame(bef = Sbef, aft = Saft)
}

fun("hello world")
#     bef aft
# 1 world

Edit: thanks for @DarrenTsai's comment, we can use substring in a vectorized fashion, removing the need for mapply.

CodePudding user response：

Please try the below code

df<- "PFSSQQRPHRHSMYVTRDKVRAKGLDGSLSIGQGMAARANSLQLLSPQPGEQLPPEMTVA"
df3 <- data.frame(pos=unlist(gregexpr('S', df)), string="PFSSQQRPHRHSMYVTRDKVRAKGLDGSLSIGQGMAARANSLQLLSPQPGEQLPPEMTVA")

df3 %>% mutate(string2=str_sub(str_sub(string,1,pos-1),-5,-1), string3=str_sub(str_sub(string,pos 1),1,5))