The following code should not return the 4th row, but it does. So what's wrong with it?
import pandas as pd
D = {'serial': ['M0', 'M1','M2', 0, 'M21'], 'f_1_norm' :[100, 110, 130,0,90],\
'f_1_raw' :[1.2, 1.5, 2.3, 0, 1.9], \
'f_2_raw' :[17.5, 1112.5, 1, 0, 11], \
'notes': ['disk-x', 'disk-y', 'disk-m', 0, 'disk-bar'], \
}
df=pd.DataFrame.from_dict(D)
print(D)
print('------------------------')
print(df)
#
print('------------------------')
df1 = df.loc[:, (df != 0.0).any(axis=0)]
print(df1)
Here is the output of the code shown above
------------------------
serial f_1_norm f_1_raw f_2_raw notes
0 M0 100 1.2 17.5 disk-x
1 M1 110 1.5 1112.5 disk-y
2 M2 130 2.3 1.0 disk-m
3 0 0 0.0 0.0 0
4 M21 90 1.9 11.0 disk-bar
------------------------
serial f_1_norm f_1_raw f_2_raw notes
0 M0 100 1.2 17.5 disk-x
1 M1 110 1.5 1112.5 disk-y
2 M2 130 2.3 1.0 disk-m
3 0 0 0.0 0.0 0
4 M21 90 1.9 11.0 disk-bar
So the construct using df.loc
does not do what I want to.
CodePudding user response:
If your dataset is structured in a way that you can drop all rows where serial == 0
it is as simply as:
df1 = df[df.serial != 0]
CodePudding user response:
Use DataFrame.select_dtypes
for numeric columns and then change axis=0
to axis=1
:
df1 = df[(df.select_dtypes(np.number) != 0.0).any(axis=1)]
print(df1)
serial f_1_norm f_1_raw f_2_raw notes
0 M0 100 1.2 17.5 disk-x
1 M1 110 1.5 1112.5 disk-y
2 M2 130 2.3 1.0 disk-m
4 M21 90 1.9 11.0 disk-bar