Home > front end >  Sorting a data frame with conditions
Sorting a data frame with conditions

Time:05-20

Having this data frame

df = pd.DataFrame({'id':[3333311,3455572,6464544,2323322,2222111,4333311,5454566,3321767],'A':['Na','Na','Na','Na','Na','Na','56 days','54 days'],'B':['6 days','31 days','33 days','46 days','44 days','16 days','41 days','42 days'],'Percentage':[0.41,0.36,0.36,0.42,0.25,0.56,0.25,0.42]})

print(df)

id        A        B  Percentage
0  3333311            6 days        0.41
1  3455572           31 days        0.36
2  6464544           33 days        0.36
3  2323322           46 days        0.42
4  2222111           44 days        0.25
5  4333311           16 days        0.56
6  5454566  56 days  41 days        0.25
7  3321767  54 days  42 days        0.42

I would like to sort values based on percentage but I would like also to put rows with values in A sorted first and then the rows sorted without data in A. Like this.

id        A        B  Percentage
7  3321767  54 days  42 days        0.42
6  5454566  56 days  41 days        0.25
5  4333311           16 days        0.56
3  2323322           46 days        0.42
0  3333311            6 days        0.41
1  3455572           31 days        0.36
2  6464544           33 days        0.36
4  2222111           44 days        0.25

EDIT: I have also found that the column percentages have some Na values. I would like columns with Na at the end of the data frame in both subgroups. I know how to do this in a simple sorting task. I mention it here in case, it is needed

CodePudding user response:

By default is na_position='last' in DataFrame.sort_values, so after converting to timedeltas sorting by both columns, first ascending, second descending:

df['A'] = pd.to_timedelta(df['A'], errors='coerce')

df = df.sort_values(['A','Percentage'], ascending=[True, False])
print (df)
        id       A       B  Percentage
7  3321767 54 days 42 days        0.42
6  5454566 56 days 41 days        0.25
5  4333311     NaT 16 days        0.56
3  2323322     NaT 46 days        0.42
0  3333311     NaT  6 days        0.41
1  3455572     NaT 31 days        0.36
2  6464544     NaT 33 days        0.36
4  2222111     NaT 44 days        0.25
  • Related