Home > Software engineering >  filtering out Pandas dataframe rows that contain zeros
filtering out Pandas dataframe rows that contain zeros

Time:03-23

The following code should not return the 4th row, but it does. So what's wrong with it?

import pandas as pd
D = {'serial': ['M0', 'M1','M2', 0, 'M21'], 'f_1_norm' :[100, 110, 130,0,90],\
'f_1_raw' :[1.2, 1.5, 2.3, 0, 1.9], \
'f_2_raw' :[17.5, 1112.5, 1,  0, 11], \
'notes': ['disk-x', 'disk-y', 'disk-m', 0, 'disk-bar'], \
}
df=pd.DataFrame.from_dict(D)
print(D)
print('------------------------')
print(df)
#
print('------------------------')
df1 = df.loc[:, (df != 0.0).any(axis=0)]
print(df1)

Here is the output of the code shown above

------------------------
  serial  f_1_norm  f_1_raw  f_2_raw     notes
0     M0       100      1.2     17.5    disk-x
1     M1       110      1.5   1112.5    disk-y
2     M2       130      2.3      1.0    disk-m
3      0         0      0.0      0.0         0
4    M21        90      1.9     11.0  disk-bar
------------------------
  serial  f_1_norm  f_1_raw  f_2_raw     notes
0     M0       100      1.2     17.5    disk-x
1     M1       110      1.5   1112.5    disk-y
2     M2       130      2.3      1.0    disk-m
3      0         0      0.0      0.0         0
4    M21        90      1.9     11.0  disk-bar

So the construct using df.loc does not do what I want to.

CodePudding user response:

If your dataset is structured in a way that you can drop all rows where serial == 0 it is as simply as:

df1 = df[df.serial != 0]

CodePudding user response:

Use DataFrame.select_dtypes for numeric columns and then change axis=0 to axis=1:

df1 = df[(df.select_dtypes(np.number) != 0.0).any(axis=1)]
print(df1)
  serial  f_1_norm  f_1_raw  f_2_raw     notes
0     M0       100      1.2     17.5    disk-x
1     M1       110      1.5   1112.5    disk-y
2     M2       130      2.3      1.0    disk-m
4    M21        90      1.9     11.0  disk-bar
  • Related