I have an excel file that has to be sorted by 'Priority Score' and 'Tested',
Priority is already sorted but cant seem to figure out how to sort 'Tested' by a certain string.. ie "Pending"
Heres what I got so far:
import pandas as pd
df = pd.read_excel('TestsPopulation_04-12-2022-09-03-39.xlsx')
df.sort_values(['Priority Score', 'Tested'], ascending=False)
new_df = df.drop(columns=['Filtering list here'])
print(new_df.head(60))
It prints as the following :
https://i.stack.imgur.com/8SZh4.png
https://i.stack.imgur.com/rkoaL.png
Goal is to first sort out by 'Priority Score' which holds an integer as a value then go over 'Tested' and if something comes up as 'Pending' print the whole column.
Help is highly appreciated!
CodePudding user response:
Here is something you can do. It will first filter for Tested == PENDING, then sort in descending order by Priority and secondarily by Tested. However, if we have already ruled out all rows that do not have Tested == PENDING, then the secondary sort on Tested shouldn't be necessary.
Note also that your code was not capturing the result of sort_values()
.
import pandas as pd
df = pd.read_excel('TestsPopulation_04-12-2022-09-03-39.xlsx')
print(f"\nINPUT from .xlsx\n{df}")
new_df = df[df['Tested'] == 'PENDING'].sort_values(['Priority Score', 'Tested'], ascending=False).drop(columns=['Column to drop', 'Another to drop'])
print(f"\nOUTPUT after filtering for PENDING and sorting\n{new_df.head(60)}")
Test case contents of .xlsx:
Priority Score Tested Column to drop Another to drop
5 Fail x a
5 PENDING x a
4 Pass y b
3 Pass x a
3 Fail x a
3 PENDING x a
2 PENDING y b
1 Fail x a
Output:
INPUT from .xlsx
Priority Score Tested Column to drop Another to drop
0 5 Fail x a
1 5 PENDING x a
2 4 Pass y b
3 3 Pass x a
4 3 Fail x a
5 3 PENDING x a
6 2 PENDING y b
7 1 Fail x a
OUTPUT after filtering for PENDING and sorting
Priority Score Tested
1 5 PENDING
5 3 PENDING
6 2 PENDING
UPDATE: Here's how to specify columns to keep, rather than dropping ones we don't want to keep.
import pandas as pd
df = pd.read_excel('TestsPopulation_04-12-2022-09-03-39.xlsx')
print(f"\nINPUT from .xlsx\n{df}")
columnsToKeep = ['Priority Score', 'Tested', 'Column to keep', 'Another column to keep']
new_df = df[df['Tested'] == 'PENDING'].sort_values(['Priority Score', 'Tested'], ascending=False)[columnsToKeep]
print(f"\nOUTPUT after filtering for PENDING and sorting\n{new_df.head(60)}")
Output:
INPUT from .xlsx
Priority Score Tested Column to drop Another to drop Column to keep Another column to keep
0 5 Fail x a 0 this row will get filtered out
1 5 PENDING x a 1 this row will remain
2 4 Pass y b 2 another row to exclude
3 3 Pass x a 3 and another
4 3 Fail x a 2 and another
5 3 PENDING x a 1 another keeper
6 2 PENDING y b 0 yet another keeper
7 1 Fail x a 1 this row will get filtered out
OUTPUT after filtering for PENDING and sorting
Priority Score Tested Column to keep Another column to keep
1 5 PENDING 1 this row will remain
5 3 PENDING 1 another keeper
6 2 PENDING 0 yet another keeper