I'm trying to drop certain rows if most of the values of the columns is 0. however I know of two columns that will always include a value in port and port speed.
Port | Port Speed | rx_bytes | rx_packets | [...] | tx_bytes |
---|---|---|---|---|---|
1-1 | 40000 | 96226349052316 | 152878404874 | 0 | 0 |
1-2 | 40000 | 102000894940050 | 149281284683 | 0 | 123 |
1-3 | 40000 | 1329621841505692 | 2128668150695 | 0 | 0 |
1-4 | 40000 | 1330817801586198 | 0 | 0 | 123 |
1-5 | 40000 | 0 | 0 | 0 | 0 |
1-6 | 40000 | 0 | 0 | 0 | 0 |
I read up about dropna(thresh=3)
however this only operates on NaN
, but is it possible to achieve this if the value is 0.
Expected return
Port | Port Speed | rx_bytes | rx_packets | [...] | tx_bytes |
---|---|---|---|---|---|
1-1 | 40000 | 96226349052316 | 152878404874 | 0 | 0 |
1-2 | 40000 | 102000894940050 | 149281284683 | 0 | 123 |
1-3 | 40000 | 1329621841505692 | 2128668150695 | 0 | 0 |
1-4 | 40000 | 1330817801586198 | 0 | 0 | 123 |
CodePudding user response:
An easy way is to convert 0 to nan
then dropna
and fillna
by 0:
>>> df.replace(0, np.nan).dropna(thresh=3).fillna(0)
Port Port Speed rx_bytes rx_packets [...] tx_bytes
0 1-1 40000 9.622635e 13 1.528784e 11 0.0 0.0
1 1-2 40000 1.020009e 14 1.492813e 11 0.0 123.0
2 1-3 40000 1.329622e 15 2.128668e 12 0.0 0.0
3 1-4 40000 1.330818e 15 0.000000e 00 0.0 123.0
Or use a boolean mask:
>>> df[df.eq(0).sum(1).le(3)] # thres=3
Port Port Speed rx_bytes rx_packets [...] tx_bytes
0 1-1 40000 9.622635e 13 1.528784e 11 0.0 0.0
1 1-2 40000 1.020009e 14 1.492813e 11 0.0 123.0
2 1-3 40000 1.329622e 15 2.128668e 12 0.0 0.0
3 1-4 40000 1.330818e 15 0.000000e 00 0.0 123.0