How can you sort a data frame by number of non zero entries using python?-CodePudding

I have a data frame like this:

a	b	c	d	e
a_1	b_1	c_1	d_1	e_1
0	b_2	c_2	d_2	e_2
0	b_3	c_3	0	e_3
0	0	c_4	0	e_4
0	0	0	0	e_5

I want the data frame to look like this:

e	c	b	d	a
e_1	c_1	b_1	d_1	a_1
e_2	c_2	b_2	d_2	0
e_3	c_3	b_3	0	0
e_4	c_4	0	0	0
e_5	0	0	0	0

where "letter_number" is any value not equal to 0.

CodePudding user response：

pandas >= 1.1

We can call sort_index on the columns with a custom key function:

df.sort_index(key=lambda c: df[c].ne('0').sum(), ascending=False, axis=1)

     e    c    b    d    a
0  e_1  c_1  b_1  d_1  a_1
1  e_2  c_2  b_2  d_2    0
2  e_3  c_3  b_3    0    0
3  e_4  c_4    0    0    0
4  e_5    0    0    0    0

I assumed the zeroes are in string format and not numeric.

Older versions

We can sort the column headers based on the predicate you described using python's inbuilt sorted function:

df[sorted(df, key=lambda c: df[c].ne('0').sum(), reverse=True)]

     e    c    b    d    a
0  e_1  c_1  b_1  d_1  a_1
1  e_2  c_2  b_2  d_2    0
2  e_3  c_3  b_3    0    0
3  e_4  c_4    0    0    0
4  e_5    0    0    0    0

CodePudding user response：

You can try with np.argsort and iloc:

df.iloc[:, np.argsort(df.eq('0').sum())]

Or use sort_values:

df[df.eq('0').sum().sort_values().index]

Both give:

     e    c    b    d    a
0  e_1  c_1  b_1  d_1  a_1
1  e_2  c_2  b_2  d_2    0
2  e_3  c_3  b_3    0    0
3  e_4  c_4    0    0    0
4  e_5    0    0    0    0