Pandas : Select columns based on the columns sum [duplicate]-CodePudding

This question already has an answer here:

I want to select columns where the sum of each column is more than 1

What I have tried:

df_1 = pd.DataFrame((df.sum(axis=0) >=1))
df[df_1[df_1[0]==True].index]

Is there any more efficient method?

CodePudding user response：

Use DataFrame.loc with : for select all rows and columns by mask if performance is important:

df = df.loc[:, df.sum() >=1]
print (df)
   row_one  second  third
0        0       0      0
1        1       0      1
2        1       1      1

Alternative:

df = df.loc[:, df.to_numpy().sum(axis=0) >=1]

CodePudding user response：

Or use with transposing:

>>> df.T[df.sum() >= 1].T
   row_one  second  third
0        0       0      0
1        1       0      1
2        1       1      1
>>>

That filters the transposed dataframe, that behaves regularly.

Or you could get the column names with:

>>> df[df.sum()[df.sum() >= 1].index]
   row_one  second  third
0        0       0      0
1        1       0      1
2        1       1      1
>>>

Or with drop:

>>> df.drop(df.columns[df.sum() < 1], axis=1)
   row_one  second  third
0        0       0      0
1        1       0      1
2        1       1      1
>>>

Page link：https//www.codepudding.com/Mobile/125918.html

Prev:How to use QRC files in QML files?

Next:Inverse Matrix with Variables