Home > Mobile >  modify a string python
modify a string python

Time:06-01

Good morning everyone,

I have a csv file structured in the following way:

num  mut
36    L
45    P
  ...

where num indicates the position of a mutation and mut indicates the mutation. I have to modify at the position num with the letter mut a string. I wrote the following code in python:

import pandas as pd
import os
df = pd.read_csv(r'file.csv')
df_tmp=df.astype(str)
df_tmp["folder"]=df_tmp["num"] df_tmp["mut"] #add a third column
f = open("sequence.txt", 'r')
content = f.read()
for i in range(len(df)):
     num=df_tmp.num.loc[[i]]-13
     num=num.astype(int)
     prev=num-1
     prev=prev.astype(int)
     mut=df_tmp.mut.loc[[i]]
     mut=mut.astype(str)
     new="".join((content[:prev],mut,content[num:])) #this should modify the file

But it returns me

TypeError: slice indices must be integers or None or have an __index__ method

How can I solve?

Thanks in advance Mattia

Edit: maybe it is more clear what I want to do. I have to insert only the first mutation in my sequence, save it to a file, copy the file in a folder that is named as the third column (that I added in the code), make the same thing with the second mutation, then the third and so on. But I have to insert only one mutation at time

CodePudding user response:

multiple mutations:

IIUC, you'd be better off pandas, convert your dataframe to dictionary, iterate and join:

# input DataFrame
df = pd.DataFrame({'num': [36, 45], 'mut': ['L', 'P']})

# input string
string = '-'*50
# '--------------------------------------------------'

# get the positions to modify
pos = df.set_index('num')['mut'].to_dict()
# {36: 'L', 45: 'P'}

# iterate over the string, replace hte characters if in the dictionary
# NB. define start=1 if you want the first position to be 1
new_string = ''.join([pos.get(i, c) for i,c in enumerate(string, start=0)])
# '------------------------------------L--------P----'

single mutations:

string = '-'*50
# '--------------------------------------------------'

for idx, r in df.iterrows():
    new_string = string[:r['num']-1] r['mut'] string[r['num']:]
    # or
    # new_string = ''.join([string[:r['num']-1], r['mut'], string[r['num']:]])
    
    with open(f'file_{idx}.txt', 'w') as f:
        f.write(new_string)

output:

file_0.txt
-----------------------------------L--------------

file_1.txt
--------------------------------------------P-----

CodePudding user response:

I tried your code with a sample file.csv and an empty sequence.txt file,

in your code first line from for loop

num=df_tmp.num.loc[[i]]-13
#gives an error since the num in that location is str, to correct that:

num=df_tmp.num.loc[[i]].astype(int)-13 
# I used astype to convert it into int first

After this the next error is in last line , the slice indices type error, This is due to the fact that , the resulting prev and num you use to slice the content variable is not a int, to get the int value add a [0] to it in this way:

content="".join((content[:prev[0]],mut,content[num[0]:]))

There shouldn't be an error now.

  • Related