Sorting in pandas python-CodePudding

I have an excel file that has to be sorted by 'Priority Score' and 'Tested',

Priority is already sorted but cant seem to figure out how to sort 'Tested' by a certain string.. ie "Pending"

Heres what I got so far:

import pandas as pd

df = pd.read_excel('TestsPopulation_04-12-2022-09-03-39.xlsx')

df.sort_values(['Priority Score', 'Tested'], ascending=False)

new_df = df.drop(columns=['Filtering list here'])

print(new_df.head(60))

It prints as the following :

https://i.stack.imgur.com/8SZh4.png

https://i.stack.imgur.com/rkoaL.png

Goal is to first sort out by 'Priority Score' which holds an integer as a value then go over 'Tested' and if something comes up as 'Pending' print the whole column.

Help is highly appreciated!

CodePudding user response：

Here is something you can do. It will first filter for Tested == PENDING, then sort in descending order by Priority and secondarily by Tested. However, if we have already ruled out all rows that do not have Tested == PENDING, then the secondary sort on Tested shouldn't be necessary.

Note also that your code was not capturing the result of sort_values().

import pandas as pd
df = pd.read_excel('TestsPopulation_04-12-2022-09-03-39.xlsx')
print(f"\nINPUT from .xlsx\n{df}")
new_df = df[df['Tested'] == 'PENDING'].sort_values(['Priority Score', 'Tested'], ascending=False).drop(columns=['Column to drop', 'Another to drop'])
print(f"\nOUTPUT after filtering for PENDING and sorting\n{new_df.head(60)}")

Test case contents of .xlsx:

Priority Score  Tested  Column to drop  Another to drop
5   Fail    x   a
5   PENDING x   a
4   Pass    y   b
3   Pass    x   a
3   Fail    x   a
3   PENDING x   a
2   PENDING y   b
1   Fail    x   a

Output:


INPUT from .xlsx
   Priority Score   Tested Column to drop Another to drop
0               5     Fail              x               a
1               5  PENDING              x               a
2               4     Pass              y               b
3               3     Pass              x               a
4               3     Fail              x               a
5               3  PENDING              x               a
6               2  PENDING              y               b
7               1     Fail              x               a

OUTPUT after filtering for PENDING and sorting
   Priority Score   Tested
1               5  PENDING
5               3  PENDING
6               2  PENDING

UPDATE: Here's how to specify columns to keep, rather than dropping ones we don't want to keep.

import pandas as pd
df = pd.read_excel('TestsPopulation_04-12-2022-09-03-39.xlsx')
print(f"\nINPUT from .xlsx\n{df}")
columnsToKeep = ['Priority Score', 'Tested', 'Column to keep', 'Another column to keep']
new_df = df[df['Tested'] == 'PENDING'].sort_values(['Priority Score', 'Tested'], ascending=False)[columnsToKeep]
print(f"\nOUTPUT after filtering for PENDING and sorting\n{new_df.head(60)}")

Output:


INPUT from .xlsx
   Priority Score   Tested Column to drop Another to drop  Column to keep          Another column to keep
0               5     Fail              x               a               0  this row will get filtered out
1               5  PENDING              x               a               1            this row will remain
2               4     Pass              y               b               2          another row to exclude
3               3     Pass              x               a               3                     and another
4               3     Fail              x               a               2                     and another
5               3  PENDING              x               a               1                  another keeper
6               2  PENDING              y               b               0              yet another keeper
7               1     Fail              x               a               1  this row will get filtered out

OUTPUT after filtering for PENDING and sorting
   Priority Score   Tested  Column to keep Another column to keep
1               5  PENDING               1   this row will remain
5               3  PENDING               1         another keeper
6               2  PENDING               0     yet another keeper