Home > front end >  Replace DNA nucleotide at given position in DNA sequence using for loop
Replace DNA nucleotide at given position in DNA sequence using for loop

Time:07-11

In the R data frame, I am trying to replace mutation column DNA nucleotide into WT.seq using position column numbers.

Following is my data frame

    transcript  position    ref mutation    type    WT.seq
1   trx1    5   A   G   substitution    ATAAAA
2   trx2    3   C   A   substitution    CCCCCC
3   trx3    7   T   C   substitution    AAAAAATGG

Expected output in the data frame

    transcript  position    ref mutation    type    WT.seq
1   trx1    5   A   G   substitution    ATAAGA
2   trx2    3   C   A   substitution    CCACCC
3   trx3    7   T   C   substitution    AAAAAACGG

Explanation

for example, in the given output data frame WT.seq column contains DNA sequences, and in the first row of WT.seq there is DNA sequence ATAAAA is present and I have to replace mutation column DNA nucleotide G(mutation column,1st row) at 5th position of ATAAAA, after replacing G at 5th position in this sequence it will be ATAAGA. This position number is given from the position column,1st row. I have to do this for all rows in the data frame. My data frame contains thousands of rows.

In the above output,i have done it for the first row using the following code.

DNA_seq <- read.table("sequences.txt",sep = "\t",header = T)

df<- as.data.frame(DNA_seq)

substring(df[1,6], first=df[1,2]) <- df[1,4]

I want to run for loop on the remaining rows so that all mutation nucleotide replacement will be done in WT.seq column with help of position column numbers

CodePudding user response:

You could strsplit, replace position with mutation in Map and paste back together.

transform(dat, WT.mut=Map(replace, strsplit(WT.seq, ''), position, mutation) |>
  sapply(paste, collapse=''))
#   transcript position ref mutation         type    WT.seq    WT.mut
# 1       trx1        5   A        G substitution    ATAAAA    ATAAGA
# 2       trx2        3   C        A substitution    CCCCCC    CCACCC
# 3       trx3        7   T        C substitution AAAAAATGG AAAAAACGG

I used an extra column to demonstrate, just replace WT.mut= with WT.seq= to overwrite.

  • Related