Home > Software engineering >  How to filter numeric values with separator in same cell in pandas?
How to filter numeric values with separator in same cell in pandas?

Time:08-07

I have problems filtering numeric data in pandas. enter image description here

I have 10 000 data and i need them to filter out values in 3rd column > then 10.

Data in 3rd column are dtype object and cells of this column contains 3 types of data: dot (as None); 12.25 (single values); 12,45,12.5 (multiple values separated by ,).

I tried:

  1. separating the values by ',' with str.methods
  2. filtering splited values with greater then 10
  3. with df.loc then filter main dataframe with filtered columns (filter value - values from filtered column == same column from main dataframe)
#Data sample
{'POS': {0: 20482821,
  1: 20482980,
  2: 20483463,
  3: 20485526,
  4: 20485536,
  5: 20485630,
  6: 20485811,
  7: 20485948,
  8: 109274570,
  9: 109274623,
  10: 109274677,
  11: 109274857,
  12: 109274968,
  13: 109275216,
  14: 109275325,
  15: 109275506,
  16: 109275536,
  17: 109275600,
  18: 109275641,
  19: 109275648,
  20: 109275684,
  21: 197042891,
  22: 197042926,
  23: 197043092,
  24: 197043111},
 'CHROM': {0: 'chr1',
  1: 'chr1',
  2: 'chr1',
  3: 'chr1',
  4: 'chr1',
  5: 'chr1',
  6: 'chr1',
  7: 'chr1',
  8: 'chr1',
  9: 'chr1',
  10: 'chr1',
  11: 'chr1',
  12: 'chr1',
  13: 'chr1',
  14: 'chr1',
  15: 'chr1',
  16: 'chr1',
  17: 'chr1',
  18: 'chr1',
  19: 'chr1',
  20: 'chr1',
  21: 'chr3',
  22: 'chr3',
  23: 'chr3',
  24: 'chr3'},
 'CADD_phred': {0: 14.27,
  1: '.',
  2: '.',
  3: 17.1,
  4: 17.61,
  5: '20.1,19.64',
  6: 15.99,
  7: 15.95,
  8: 1.551,
  9: 5.142,
  10: 14.05,
  11: 6.579,
  12: 1.225,
  13: 14.38,
  14: 5.841,
  15: 3.85,
  16: 4.373,
  17: '.',
  18: 16.95,
  19: 16.94,
  20: 3.067,
  21: '.',
  22: 5.925,
  23: 10.3,
  24: 9.495}}

CodePudding user response:

If you have the data in a dataframe, e.g. df, all you have to do:

df = df[df['#3'] < 10]

CodePudding user response:

If you want to keep rows whose values are less than threshold (e.g. 10.0)

Code

def greater(x, threshold = 10.0):
    '''
        Checks if there is a value > threshold
        
        x values
        - comma separated string
        - float
        - empty  
    '''
    if x and isinstance(x, str) and "," in x:
        # comma separated string
        # Check if list has a value > threshold
        return any(v > threshold for v in [float(i) for i in x.split(',')])
    elif x and isinstance(x, float):
        # Float
        return x > threshold
    else:
        # Empty
        return False

# Generate dataframe from posted data
df = pd.DataFrame(data)

# Keep rows that don't have a value > 10 in column CADD_phred
filtered = df[df.CADD_phred.apply(lambda x: not greater(x, 10.0))]

print(filtered)

Output

POS CHROM   CADD_phred
1   20482980    chr1    .
2   20483463    chr1    .
8   109274570   chr1    1.551
9   109274623   chr1    5.142
11  109274857   chr1    6.579
12  109274968   chr1    1.225
14  109275325   chr1    5.841
15  109275506   chr1    3.850
16  109275536   chr1    4.373
17  109275600   chr1    .
20  109275684   chr1    3.067
21  197042891   chr3    .
22  197042926   chr3    5.925
24  197043111   chr3    9.495
  • Related