Home > Net >  Filtering rows based on specific condition
Filtering rows based on specific condition

Time:09-20

I want to filter the rows based on CPD(object data type) range:

Input: 

 ---- ----------- ---------- -------- ------------ ------------- 
|    | cpd_range | accuracy | n_size |   model    | frontend_id |
 ---- ----------- ---------- -------- ------------ ------------- 
|  0 | >55       | 32.86    |    140 | TFT_no_lag |       30667 |
|  1 | 50-55     | 31.43    |     35 | TFT_no_lag |       30667 |
|  2 | 45-50     | 21.43    |     28 | TFT_no_lag |       30667 |
|  3 | 40-45     | 34.29    |     35 | TFT_no_lag |       30667 |
|  4 | 35-40     | 14.29    |     70 | TFT_no_lag |       30667 |
|  5 | 30-35     | 25.00    |     56 | TFT_no_lag |       30667 |
|  6 | 25-30     | 28.57    |     49 | TFT_no_lag |       30667 |
|  7 | 20-25     | 19.05    |    105 | TFT_no_lag |       30667 |
|  8 | 15-20     | 20.06    |    329 | TFT_no_lag |       30667 |
|  9 | 10-15     | 19.37    |    630 | TFT_no_lag |       30667 |
| 10 | 5-10      | 18.07    |   1981 | TFT_no_lag |       30667 |
| 11 | 0-5       | 26.74    |   9982 | TFT_no_lag |       30667 |
 ---- ----------- ---------- -------- ------------ ------------- 

Final Output: 
 --- ----------- ---------- -------- ------------ ------------- 
|   | cpd_range | accuracy | n_size |   model    | frontend_id |
 --- ----------- ---------- -------- ------------ ------------- 
| 0 | >55       | 32.86    |    140 | TFT_no_lag |       30667 |
| 1 | 50-55     | 31.43    |     35 | TFT_no_lag |       30667 |
| 2 | 45-50     | 21.43    |     28 | TFT_no_lag |       30667 |
| 3 | 40-45     | 34.29    |     35 | TFT_no_lag |       30667 |
| 4 | 35-40     | 14.29    |     70 | TFT_no_lag |       30667 |
| 5 | 30-35     | 25.00    |     56 | TFT_no_lag |       30667 |
| 6 | 25-30     | 28.57    |     49 | TFT_no_lag |       30667 |
| 7 | 20-25     | 19.05    |    105 | TFT_no_lag |       30667 |
| 8 | 15-20     | 20.06    |    329 | TFT_no_lag |       30667 |
| 9 | 10-15     | 19.37    |    630 | TFT_no_lag |       30667 |
 --- ----------- ---------- -------- ------------ ------------- 

I want to remove the rows with CPD_slot_range < 10, but the column is of object type, so I have to do this through regular expressions, but I'm new to regular expressions, can you please guide for this.

CodePudding user response:

Your cpd_range is close to being formatted in a way that it could be made into an ordered categorical!

# Fix the one value that won't sort correctly
df.cpd_range = df.cpd_range.replace('5-10', '05-10')

# Make it an ordered categorical!
df.cpd_range = pd.Categorical(df.cpd_range, ordered=True)

# Now we can remove values by selecting only values greater than our cutoff~
df = df[df.cpd_range.gt('05-10')]

print(df)

Output:

  cpd_range  accuracy  n_size       model  frontend_id
0       >55     32.86     140  TFT_no_lag        30667
1     50-55     31.43      35  TFT_no_lag        30667
2     45-50     21.43      28  TFT_no_lag        30667
3     40-45     34.29      35  TFT_no_lag        30667
4     35-40     14.29      70  TFT_no_lag        30667
5     30-35     25.00      56  TFT_no_lag        30667
6     25-30     28.57      49  TFT_no_lag        30667
7     20-25     19.05     105  TFT_no_lag        30667
8     15-20     20.06     329  TFT_no_lag        30667
9     10-15     19.37     630  TFT_no_lag        30667

# We'll print this out as well, so you can see the dtype and how it's ordered~
>>> df.cpd_range.cat.as_ordered()
0      >55
1    50-55
2    45-50
3    40-45
4    35-40
5    30-35
6    25-30
7    20-25
8    15-20
9    10-15
Name: cpd_range, dtype: category
Categories (12, object): ['0-5' < '05-10' < '10-15' < '15-20' ... '40-45' < '45-50' < '50-55' < '>55']

CodePudding user response:

If possible test last integer in column cpd_range use Series.str.extract with \d for match digits, $ for end of strings, convert to integers and filter in boolean indexing if greater like 10:

df = df[df['cpd_range'].str.extract(r'(\d )$', expand=False).astype(int).gt(10)]
print (df)
  cpd_range  accuracy  n_size       model  frontend_id
0       >55     32.86     140  TFT_no_lag        30667
1     50-55     31.43      35  TFT_no_lag        30667
2     45-50     21.43      28  TFT_no_lag        30667
3     40-45     34.29      35  TFT_no_lag        30667
4     35-40     14.29      70  TFT_no_lag        30667
5     30-35     25.00      56  TFT_no_lag        30667
6     25-30     28.57      49  TFT_no_lag        30667
7     20-25     19.05     105  TFT_no_lag        30667
8     15-20     20.06     329  TFT_no_lag        30667
9     10-15     19.37     630  TFT_no_lag        30667
  • Related