Home > Mobile >  For each set of 5 columns, drop the 3rd, 4th and 5th columns
For each set of 5 columns, drop the 3rd, 4th and 5th columns

Time:06-14

I am cleaning a pandas dataframe imported from a .csv. It has useful data in the first and second columns, then junk in columns 3-5. This pattern repeats where every 5th column starting from the first and second columns are useful, and every 5th column starting from the third through fifth are junk. I can remove the junk columns using the code below:

df1 = df.drop(columns=df.columns[4::5])
df1 = df1.drop(columns=df1.columns[3::4])
df1 = df1.drop(columns=df1.columns[2::3])

Is there a solution to do this all in one line?

CodePudding user response:

I think three lines is fine. The code won't get any clearer or faster from putting it all on one line.

Of course, you can always do:

columns = df.columns[:]
df1 = df.drop(columns=columns[4::5]).drop(columns=columns[3::5]).drop(columns=columns[2::5])

which I think also makes it clearer you intend to drop the fifth, fourth and third column every five columns.

CodePudding user response:

Boolean indexing the columns using numpy could be useful

import numpy as np
# select 1st and 2nd columns of every 5 columns
df1.loc[:, np.isin(np.arange(df1.shape[1]) % 5, [0,1])]

CodePudding user response:

You may use np.r_ to concatenate indexes in an easy way:

>>> c = df.columns
>>> df.drop(columns=np.r_[c[2::5], c[3::5], c[4::5]])

CodePudding user response:

You can do

df1 = pd.concat([df.iloc[:, ::5], df.iloc[:, 1::5]], axis='columns')

That will change the column order, but with well-named columns, that shouldn't matter.

  • Related