In the R data frame, I am trying to replace mutation column DNA nucleotide into WT.seq using position column numbers.
Following is my data frame
transcript position ref mutation type WT.seq
1 trx1 5 A G substitution ATAAAA
2 trx2 3 C A substitution CCCCCC
3 trx3 7 T C substitution AAAAAATGG
Expected output in the data frame
transcript position ref mutation type WT.seq
1 trx1 5 A G substitution ATAAGA
2 trx2 3 C A substitution CCACCC
3 trx3 7 T C substitution AAAAAACGG
Explanation
for example, in the given output data frame WT.seq column
contains DNA sequences, and in the first row of WT.seq there is DNA sequence ATAAAA
is present and I have to replace mutation column DNA nucleotide G(mutation column,1st row)
at 5th position of ATAAAA
, after replacing G at 5th position
in this sequence it will be ATAAGA
. This position number is given from the position column,1st row
. I have to do this for all rows in the data frame. My data frame contains thousands of rows.
In the above output,i have done it for the first row using the following code.
DNA_seq <- read.table("sequences.txt",sep = "\t",header = T)
df<- as.data.frame(DNA_seq)
substring(df[1,6], first=df[1,2]) <- df[1,4]
I want to run for loop on the remaining rows so that all mutation nucleotide replacement will be done in WT.seq column with help of position column numbers
CodePudding user response:
You could strsplit
, replace
position with mutation in Map
and paste
back together.
transform(dat, WT.mut=Map(replace, strsplit(WT.seq, ''), position, mutation) |>
sapply(paste, collapse=''))
# transcript position ref mutation type WT.seq WT.mut
# 1 trx1 5 A G substitution ATAAAA ATAAGA
# 2 trx2 3 C A substitution CCCCCC CCACCC
# 3 trx3 7 T C substitution AAAAAATGG AAAAAACGG
I used an extra column to demonstrate, just replace WT.mut=
with WT.seq=
to overwrite.