Home > Net >  How can you sort a data frame by number of non zero entries using python?
How can you sort a data frame by number of non zero entries using python?

Time:10-04

I have a data frame like this:

a b c d e
a_1 b_1 c_1 d_1 e_1
0 b_2 c_2 d_2 e_2
0 b_3 c_3 0 e_3
0 0 c_4 0 e_4
0 0 0 0 e_5

I want the data frame to look like this:

e c b d a
e_1 c_1 b_1 d_1 a_1
e_2 c_2 b_2 d_2 0
e_3 c_3 b_3 0 0
e_4 c_4 0 0 0
e_5 0 0 0 0

where "letter_number" is any value not equal to 0.

CodePudding user response:

pandas >= 1.1

We can call sort_index on the columns with a custom key function:

df.sort_index(key=lambda c: df[c].ne('0').sum(), ascending=False, axis=1)

     e    c    b    d    a
0  e_1  c_1  b_1  d_1  a_1
1  e_2  c_2  b_2  d_2    0
2  e_3  c_3  b_3    0    0
3  e_4  c_4    0    0    0
4  e_5    0    0    0    0

I assumed the zeroes are in string format and not numeric.


Older versions

We can sort the column headers based on the predicate you described using python's inbuilt sorted function:

df[sorted(df, key=lambda c: df[c].ne('0').sum(), reverse=True)]

     e    c    b    d    a
0  e_1  c_1  b_1  d_1  a_1
1  e_2  c_2  b_2  d_2    0
2  e_3  c_3  b_3    0    0
3  e_4  c_4    0    0    0
4  e_5    0    0    0    0

CodePudding user response:

You can try with np.argsort and iloc:

df.iloc[:, np.argsort(df.eq('0').sum())]

Or use sort_values:

df[df.eq('0').sum().sort_values().index]

Both give:

     e    c    b    d    a
0  e_1  c_1  b_1  d_1  a_1
1  e_2  c_2  b_2  d_2    0
2  e_3  c_3  b_3    0    0
3  e_4  c_4    0    0    0
4  e_5    0    0    0    0
  • Related