Home > Back-end >  How to delete only containing special character rows and simultaneously retain the rest value rows?
How to delete only containing special character rows and simultaneously retain the rest value rows?

Time:10-12

Here is my code. I would like to delete the Ko_EC rows only containing the specific EC character like "--" or "3.6.3.-" and retain the rest EC character rows in a new pd.

# coding=utf-8
import pandas as pd
import numpy as np

#########
classes = [('--', 'c82241_g1', 'K07793'),
         ('3.6.3.-', 'c84674_g1', 'K10041'),
         ('1.2.5.1', 'c82377_g1', 'K00156'),
         ('3.1.1.3 2.3.1.-', 'c87035_g1', 'K14675'),
         ('2.7.2.3', 'c82661_g1', 'K00927'),
         ('1.7.99.4', 'c82688_g1', 'K00371'),
         ('1.1.1.- 1.1.1.76 1.1.1.304', 'c25949_g1', 'K03366'),
         ('1.1.1.-', 'c82777_g1', 'K18369'),
         ('4.1.1.68 5.3.3.-', 'c84443_g1', 'K05921'),
         ('--', 'c84672_g1', 'K02012'),
         ('2.2.1.1', 'c85319_g1', 'K00615'),
         ('3.1.1.-', 'c85321_g1', 'K18372'),
         ('1.8.1.2', 'c85322_g1', 'K00380'),
         ('1.2.1.16 1.2.1.79 1.2.1.20', 'c21528_g1', 'K00135'),
         ('1.10.3.-', 'c86242_g1', 'K00425')]
labels = ['Ko_EC','Gene_ID', 'Ko_id']
alls = pd.DataFrame.from_records(classes, columns=labels)

filt = (~alls['Ko_EC'].str.contains('-'))
all2 = alls.loc[filt, :]
all2

Its results:

                         Ko_EC    Gene_ID   Ko_id
2                      1.2.5.1  c82377_g1  K00156
4                      2.7.2.3  c82661_g1  K00927
5                     1.7.99.4  c82688_g1  K00371
10                     2.2.1.1  c85319_g1  K00615
12                     1.8.1.2  c85322_g1  K00380
13  1.2.1.16 1.2.1.79 1.2.1.20  c21528_g1  K00135

What I want is :

                         Ko_EC    Gene_ID   Ko_id
2                      1.2.5.1  c82377_g1  K00156
3                      3.1.1.3  c87035_g1  K14675
4                      2.7.2.3  c82661_g1  K00927
5                     1.7.99.4  c82688_g1  K00371
6           1.1.1.76 1.1.1.304  c25949_g1  K03366
8                     4.1.1.68  c84443_g1  K05921
10                     2.2.1.1  c85319_g1  K00615
12                     1.8.1.2  c85322_g1  K00380
13  1.2.1.16 1.2.1.79 1.2.1.20  c21528_g1  K00135

Here, I could retain '3', '6', and '8' rows containing the rest EC character while deleting the EC character '2.3.1.-', '1.1.1.-' '5.3.3.-', which contained special "-".

Could anyone help me? Thanks a lot.

CodePudding user response:

You can split values with remove elements if contains -, last join back and filter out rows with empty strings in boolean indexing:

alls['Ko_EC'] = [' '.join(y for y in x.split() if '-' not in y) for x in alls['Ko_EC']]

#alternative
#f = lambda x: ' '.join(y for y in x.split() if '-' not in y)
#alls['Ko_EC'] = alls['Ko_EC'].apply(f)
all2 = alls[alls['Ko_EC'].ne('')]
print (all2)
                         Ko_EC    Gene_ID   Ko_id
2                      1.2.5.1  c82377_g1  K00156
3                      3.1.1.3  c87035_g1  K14675
4                      2.7.2.3  c82661_g1  K00927
5                     1.7.99.4  c82688_g1  K00371
6           1.1.1.76 1.1.1.304  c25949_g1  K03366
8                     4.1.1.68  c84443_g1  K05921
10                     2.2.1.1  c85319_g1  K00615
12                     1.8.1.2  c85322_g1  K00380
13  1.2.1.16 1.2.1.79 1.2.1.20  c21528_g1  K00135
  • Related