Here is my code. I would like to delete the Ko_EC
rows only containing the specific EC character like "--" or "3.6.3.-" and retain the rest EC character rows in a new pd.
# coding=utf-8
import pandas as pd
import numpy as np
#########
classes = [('--', 'c82241_g1', 'K07793'),
('3.6.3.-', 'c84674_g1', 'K10041'),
('1.2.5.1', 'c82377_g1', 'K00156'),
('3.1.1.3 2.3.1.-', 'c87035_g1', 'K14675'),
('2.7.2.3', 'c82661_g1', 'K00927'),
('1.7.99.4', 'c82688_g1', 'K00371'),
('1.1.1.- 1.1.1.76 1.1.1.304', 'c25949_g1', 'K03366'),
('1.1.1.-', 'c82777_g1', 'K18369'),
('4.1.1.68 5.3.3.-', 'c84443_g1', 'K05921'),
('--', 'c84672_g1', 'K02012'),
('2.2.1.1', 'c85319_g1', 'K00615'),
('3.1.1.-', 'c85321_g1', 'K18372'),
('1.8.1.2', 'c85322_g1', 'K00380'),
('1.2.1.16 1.2.1.79 1.2.1.20', 'c21528_g1', 'K00135'),
('1.10.3.-', 'c86242_g1', 'K00425')]
labels = ['Ko_EC','Gene_ID', 'Ko_id']
alls = pd.DataFrame.from_records(classes, columns=labels)
filt = (~alls['Ko_EC'].str.contains('-'))
all2 = alls.loc[filt, :]
all2
Its results:
Ko_EC Gene_ID Ko_id
2 1.2.5.1 c82377_g1 K00156
4 2.7.2.3 c82661_g1 K00927
5 1.7.99.4 c82688_g1 K00371
10 2.2.1.1 c85319_g1 K00615
12 1.8.1.2 c85322_g1 K00380
13 1.2.1.16 1.2.1.79 1.2.1.20 c21528_g1 K00135
What I want is :
Ko_EC Gene_ID Ko_id
2 1.2.5.1 c82377_g1 K00156
3 3.1.1.3 c87035_g1 K14675
4 2.7.2.3 c82661_g1 K00927
5 1.7.99.4 c82688_g1 K00371
6 1.1.1.76 1.1.1.304 c25949_g1 K03366
8 4.1.1.68 c84443_g1 K05921
10 2.2.1.1 c85319_g1 K00615
12 1.8.1.2 c85322_g1 K00380
13 1.2.1.16 1.2.1.79 1.2.1.20 c21528_g1 K00135
Here, I could retain '3', '6', and '8' rows containing the rest EC character while deleting the EC character '2.3.1.-', '1.1.1.-' '5.3.3.-', which contained special "-".
Could anyone help me? Thanks a lot.
CodePudding user response:
You can split values with remove elements if contains -
, last join back and filter out rows with empty strings in boolean indexing
:
alls['Ko_EC'] = [' '.join(y for y in x.split() if '-' not in y) for x in alls['Ko_EC']]
#alternative
#f = lambda x: ' '.join(y for y in x.split() if '-' not in y)
#alls['Ko_EC'] = alls['Ko_EC'].apply(f)
all2 = alls[alls['Ko_EC'].ne('')]
print (all2)
Ko_EC Gene_ID Ko_id
2 1.2.5.1 c82377_g1 K00156
3 3.1.1.3 c87035_g1 K14675
4 2.7.2.3 c82661_g1 K00927
5 1.7.99.4 c82688_g1 K00371
6 1.1.1.76 1.1.1.304 c25949_g1 K03366
8 4.1.1.68 c84443_g1 K05921
10 2.2.1.1 c85319_g1 K00615
12 1.8.1.2 c85322_g1 K00380
13 1.2.1.16 1.2.1.79 1.2.1.20 c21528_g1 K00135