For each set of 5 columns, drop the 3rd, 4th and 5th columns-CodePudding

I am cleaning a pandas dataframe imported from a .csv. It has useful data in the first and second columns, then junk in columns 3-5. This pattern repeats where every 5th column starting from the first and second columns are useful, and every 5th column starting from the third through fifth are junk. I can remove the junk columns using the code below:

df1 = df.drop(columns=df.columns[4::5])
df1 = df1.drop(columns=df1.columns[3::4])
df1 = df1.drop(columns=df1.columns[2::3])

Is there a solution to do this all in one line?

CodePudding user response：

I think three lines is fine. The code won't get any clearer or faster from putting it all on one line.

Of course, you can always do:

columns = df.columns[:]
df1 = df.drop(columns=columns[4::5]).drop(columns=columns[3::5]).drop(columns=columns[2::5])

which I think also makes it clearer you intend to drop the fifth, fourth and third column every five columns.

CodePudding user response：

Boolean indexing the columns using numpy could be useful

import numpy as np
# select 1st and 2nd columns of every 5 columns
df1.loc[:, np.isin(np.arange(df1.shape[1]) % 5, [0,1])]

CodePudding user response：

You may use np.r_ to concatenate indexes in an easy way:

>>> c = df.columns
>>> df.drop(columns=np.r_[c[2::5], c[3::5], c[4::5]])

CodePudding user response：

You can do

df1 = pd.concat([df.iloc[:, ::5], df.iloc[:, 1::5]], axis='columns')

That will change the column order, but with well-named columns, that shouldn't matter.