Home > Enterprise >  Sorting in pandas python
Sorting in pandas python

Time:04-22

I have an excel file that has to be sorted by 'Priority Score' and 'Tested',

Priority is already sorted but cant seem to figure out how to sort 'Tested' by a certain string.. ie "Pending"

Heres what I got so far:

import pandas as pd

df = pd.read_excel('TestsPopulation_04-12-2022-09-03-39.xlsx')

df.sort_values(['Priority Score', 'Tested'], ascending=False)

new_df = df.drop(columns=['Filtering list here'])

print(new_df.head(60))

It prints as the following :

https://i.stack.imgur.com/8SZh4.png

https://i.stack.imgur.com/rkoaL.png

Goal is to first sort out by 'Priority Score' which holds an integer as a value then go over 'Tested' and if something comes up as 'Pending' print the whole column.

Help is highly appreciated!

CodePudding user response:

Here is something you can do. It will first filter for Tested == PENDING, then sort in descending order by Priority and secondarily by Tested. However, if we have already ruled out all rows that do not have Tested == PENDING, then the secondary sort on Tested shouldn't be necessary.

Note also that your code was not capturing the result of sort_values().

import pandas as pd
df = pd.read_excel('TestsPopulation_04-12-2022-09-03-39.xlsx')
print(f"\nINPUT from .xlsx\n{df}")
new_df = df[df['Tested'] == 'PENDING'].sort_values(['Priority Score', 'Tested'], ascending=False).drop(columns=['Column to drop', 'Another to drop'])
print(f"\nOUTPUT after filtering for PENDING and sorting\n{new_df.head(60)}")

Test case contents of .xlsx:

Priority Score  Tested  Column to drop  Another to drop
5   Fail    x   a
5   PENDING x   a
4   Pass    y   b
3   Pass    x   a
3   Fail    x   a
3   PENDING x   a
2   PENDING y   b
1   Fail    x   a

Output:


INPUT from .xlsx
   Priority Score   Tested Column to drop Another to drop
0               5     Fail              x               a
1               5  PENDING              x               a
2               4     Pass              y               b
3               3     Pass              x               a
4               3     Fail              x               a
5               3  PENDING              x               a
6               2  PENDING              y               b
7               1     Fail              x               a

OUTPUT after filtering for PENDING and sorting
   Priority Score   Tested
1               5  PENDING
5               3  PENDING
6               2  PENDING

UPDATE: Here's how to specify columns to keep, rather than dropping ones we don't want to keep.

import pandas as pd
df = pd.read_excel('TestsPopulation_04-12-2022-09-03-39.xlsx')
print(f"\nINPUT from .xlsx\n{df}")
columnsToKeep = ['Priority Score', 'Tested', 'Column to keep', 'Another column to keep']
new_df = df[df['Tested'] == 'PENDING'].sort_values(['Priority Score', 'Tested'], ascending=False)[columnsToKeep]
print(f"\nOUTPUT after filtering for PENDING and sorting\n{new_df.head(60)}")

Output:


INPUT from .xlsx
   Priority Score   Tested Column to drop Another to drop  Column to keep          Another column to keep
0               5     Fail              x               a               0  this row will get filtered out
1               5  PENDING              x               a               1            this row will remain
2               4     Pass              y               b               2          another row to exclude
3               3     Pass              x               a               3                     and another
4               3     Fail              x               a               2                     and another
5               3  PENDING              x               a               1                  another keeper
6               2  PENDING              y               b               0              yet another keeper
7               1     Fail              x               a               1  this row will get filtered out

OUTPUT after filtering for PENDING and sorting
   Priority Score   Tested  Column to keep Another column to keep
1               5  PENDING               1   this row will remain
5               3  PENDING               1         another keeper
6               2  PENDING               0     yet another keeper
  • Related