Home > database >  Recognition and conversion of characters in specific positions
Recognition and conversion of characters in specific positions

Time:11-17

Input:

0      1     2
TNN    R11W  MSLQEMFRFPRGLLLGSVLLVASAPATL
ASTN1  E5V   MALAALCALLACCWGPAAVLATAAGDVDPSK
HSPB7  H19P  MSHRTSSTFRAERSFHSSHSSSSSSTSSSASRALPAQDPPMEK
CLCNKB C3Y   MECFVGLREGSSGNPVTLQELWGPCPRIRRGIRG
SZRD1  P10L  MEDEEVAESWEEAADSGEIDRRLEKKL

Expected output:

0      1     2
TNN    R11W  MSLQEMFRFPWGLLLGSVLLVASAPATL
ASTN1  E5V   NaN
HSPB7  H19P  MSHRTSSTFRAERSFHSSPSSSSSSTSSSASRALPAQDPPMEK
CLCNKB C3Y   MEYFVGLREGSSGNPVTLQELWGPCPRIRRGIRG
SZRD1  P10L  NaN

Code: examples

with open('temp.txt', 'w') as fw:
    for x in range(len(merge_two_files[1])):
        for i in range(len(merge_two_files[2])):
            if merge_two_files[1][x] == something:
                data = anything
                fw.write(str(data))

I wanna change a character in 'column 2' with the index of 'column 1'. For example, in first row, if I check the index of 'column 1', I will fine 'R' in 11th character of 'column 2'. If the character is 'R', I wanna change it as 'W'. If not, I wanna write 'NaN' in the cell. I'm sorry but is there any suggestion for me with Pandas?

CodePudding user response:

Write a custom function:

def replace_char(row):
    # explode 'R11W' into c='R', p=11, r='W'
    c, p, r =  (row['1'][0], int(row['1'][1:-1]), row['1'][-1])
    s1 = row['2']
    s2 = np.NaN
    if s1[p-1] == c:
        s2 = f"{s1[:p-1]}{r}{s1[p:]}"
    return s2

df['2'] = df.apply(replace_char, axis=1)

Output:

>>> df
        0     1                                            2
0     TNN  R11W                 MSLQEMFRFPWGLLLGSVLLVASAPATL
1   ASTN1   E5V                                          NaN
2   HSPB7  H19P  MSHRTSSTFRAERSFHSSPSSSSSSTSSSASRALPAQDPPMEK
3  CLCNKB   C3Y           MEYFVGLREGSSGNPVTLQELWGPCPRIRRGIRG
4   SZRD1  P10L                                          NaN

CodePudding user response:

This is my answer :

data = [["TNN",    "R11W",  "MSLQEMFRFPRGLLLGSVLLVASAPATL"], ["ASTN1",  "E5V",   "MALAALCALLACCWGPAAVLATAAGDVDPSK"],
["HSPB7",  "H19P",  "MSHRTSSTFRAERSFHSSHSSSSSSTSSSASRALPAQDPPMEK"],
["CLCNKB", "C3Y",   "MECFVGLREGSSGNPVTLQELWGPCPRIRRGIRG"],
["SZRD1",  "P10L",  "MEDEEVAESWEEAADSGEIDRRLEKKL"]]


result = []

for row in data:
    res_row = []
    res_row.append(row[0])
    res_row.append(row[1])

    c1 = row[1][0]
    c2 = row[1][-1]
    num = int(row[1][1:-1])


    if row[2][num-1] == c1:
        c3 = (row[2])
        l = list(c3)
        l[num-1] = c2
        c3=''.join(l)
        res_row.append(c3)
    else:
        res_row.append("NaN")

    result.append(res_row)

print(result)

  • Related