Home > Net >  Pandas Dataframe drop threshold, dropna 0 instead of NaN
Pandas Dataframe drop threshold, dropna 0 instead of NaN

Time:03-02

I'm trying to drop certain rows if most of the values of the columns is 0. however I know of two columns that will always include a value in port and port speed.

Port Port Speed rx_bytes rx_packets [...] tx_bytes
1-1 40000 96226349052316 152878404874 0 0
1-2 40000 102000894940050 149281284683 0 123
1-3 40000 1329621841505692 2128668150695 0 0
1-4 40000 1330817801586198 0 0 123
1-5 40000 0 0 0 0
1-6 40000 0 0 0 0

I read up about dropna(thresh=3) however this only operates on NaN, but is it possible to achieve this if the value is 0.

Expected return

Port Port Speed rx_bytes rx_packets [...] tx_bytes
1-1 40000 96226349052316 152878404874 0 0
1-2 40000 102000894940050 149281284683 0 123
1-3 40000 1329621841505692 2128668150695 0 0
1-4 40000 1330817801586198 0 0 123

CodePudding user response:

An easy way is to convert 0 to nan then dropna and fillna by 0:

>>> df.replace(0, np.nan).dropna(thresh=3).fillna(0)
  Port  Port Speed      rx_bytes    rx_packets  [...]  tx_bytes
0  1-1       40000  9.622635e 13  1.528784e 11    0.0       0.0
1  1-2       40000  1.020009e 14  1.492813e 11    0.0     123.0
2  1-3       40000  1.329622e 15  2.128668e 12    0.0       0.0
3  1-4       40000  1.330818e 15  0.000000e 00    0.0     123.0

Or use a boolean mask:

>>> df[df.eq(0).sum(1).le(3)]  # thres=3
  Port  Port Speed      rx_bytes    rx_packets  [...]  tx_bytes
0  1-1       40000  9.622635e 13  1.528784e 11    0.0       0.0
1  1-2       40000  1.020009e 14  1.492813e 11    0.0     123.0
2  1-3       40000  1.329622e 15  2.128668e 12    0.0       0.0
3  1-4       40000  1.330818e 15  0.000000e 00    0.0     123.0
  • Related