Having this data frame
df = pd.DataFrame({'id':[3333311,3455572,6464544,2323322,2222111,4333311,5454566,3321767],'A':['Na','Na','Na','Na','Na','Na','56 days','54 days'],'B':['6 days','31 days','33 days','46 days','44 days','16 days','41 days','42 days'],'Percentage':[0.41,0.36,0.36,0.42,0.25,0.56,0.25,0.42]})
print(df)
id A B Percentage
0 3333311 6 days 0.41
1 3455572 31 days 0.36
2 6464544 33 days 0.36
3 2323322 46 days 0.42
4 2222111 44 days 0.25
5 4333311 16 days 0.56
6 5454566 56 days 41 days 0.25
7 3321767 54 days 42 days 0.42
I would like to sort values based on percentage but I would like also to put rows with values in A sorted first and then the rows sorted without data in A. Like this.
id A B Percentage
7 3321767 54 days 42 days 0.42
6 5454566 56 days 41 days 0.25
5 4333311 16 days 0.56
3 2323322 46 days 0.42
0 3333311 6 days 0.41
1 3455572 31 days 0.36
2 6464544 33 days 0.36
4 2222111 44 days 0.25
EDIT: I have also found that the column percentages have some Na values. I would like columns with Na at the end of the data frame in both subgroups. I know how to do this in a simple sorting task. I mention it here in case, it is needed
CodePudding user response:
By default is na_position='last'
in DataFrame.sort_values
, so after converting to timedeltas sorting by both columns, first ascending, second descending:
df['A'] = pd.to_timedelta(df['A'], errors='coerce')
df = df.sort_values(['A','Percentage'], ascending=[True, False])
print (df)
id A B Percentage
7 3321767 54 days 42 days 0.42
6 5454566 56 days 41 days 0.25
5 4333311 NaT 16 days 0.56
3 2323322 NaT 46 days 0.42
0 3333311 NaT 6 days 0.41
1 3455572 NaT 31 days 0.36
2 6464544 NaT 33 days 0.36
4 2222111 NaT 44 days 0.25