I'm working on a dataframe in pandas and I would like to drop some columns based on the sum of the column values. If the total is 0, then I drop the column, otherwise I keep it. I've been trying to do it but I haven't found the way. Any tips?
CodePudding user response:
Simpliest is filter if sum is not equal 0
, so it means are rmeoved columns with sum=0
:
df.loc[:, df.sum().ne(0)]
If also non numeric columns use DataFrame.select_dtypes
with add non numeric by DataFrame.reindex
:
df = pd.DataFrame(np.random.randint(0,100,size=(3, 4)), columns=list('ABCD'))
df['E'] = 0
df['G'] = 'aa'
df = df.loc[:, df.select_dtypes(np.number).sum().ne(0).reindex(df.columns, fill_value=True)]
print (df)
A B C D G
0 47 42 16 2 aa
1 81 72 19 55 aa
2 28 70 79 29 aa
CodePudding user response:
So what I have done is the following:
1st: sum all the columns with numerical value
df_pivot_2_sum = df_pivot_2.sum(numeric_only = True)
2nd: filter only the zeroes
df_pivot_2_sum = df_pivot_2_sum[df_pivot_2_sum == 0]
3rd: get the list of columns
columns_zero_value = list(df_pivot_2_sum.index.values)
4th: drop the columns from that list
df_pivot_2 = df_pivot_2.drop(columns_zero_value, axis=1)
Thanks all for your support!