I have a dataframe, with data in each row as such.
MKEYGEDLK
How can I process the sequence strings in each row, such that the format will be as such?
[M, K, E, Y, G, E, D, L, K]
I tried
get_seq_str = ','.join(test_df.loc[0]['seq_1'])
arr.append(get_seq_str)
However, when I append it to the dataframe, there is a single quotation mark at the start and end of each string, which I do not want.
['M, K, E, Y, G, E, D, L, K']
How can I strip the single quotation marks?
CodePudding user response:
IIUC, you can try apply
list
to string value
df['col_list'] = df['col'].apply(list)
print(df)
col col_list
0 MKEYGEDLK [M, K, E, Y, G, E, D, L, K]
CodePudding user response:
You can try this.
get_seq_str = [*test_df.loc[0]['seq_1']]
CodePudding user response:
You can use str.findall
:
df['new'] = df['seq_1'].str.findall(r'[a-zA-Z]')
Example:
seq_1 new
0 MKEYGEDLK [M, K, E, Y, G, E, D, L, K]
1 ?MKEY GEDLK [M, K, E, Y, G, E, D, L, K]