Pandas Delete Duplicate Col Names with Specific Col Name-CodePudding

I cannot find the answer searching online. Suppose I have a df1 that looks like this with the following column names and I cannot just use a drop on all duplicate col names as most of the online answers suggest.

index   year   season   1      2      3     year   season   1      2      3
0       1991   winter   7.1    8.3    9.0   1991   spring   0.5    7.2    1.5
1       1992   winter   4.2    5.1    8.2   1991   spring   2.9    6.2    8.1

However, I need the column names to look like this by dropping/deleting subsequent column names of "year" but keeping the duplicate col names of "1,2,3" and 'season' so that the final new df2 looks like this:

index   year   season   1      2      3     season   1      2      3
0       1991   winter   7.1    8.3    9.0   spring   0.5    7.2    1.5
1       1992   winter   4.2    5.1    8.2   spring   2.9    6.2    8.1

Thank you,

CodePudding user response：

In your case do

df.loc[:,(df.groupby(level=0,axis=1).cumcount().eq(0)) | (df.columns!='year')]
Out[188]: 
   index  year  season    1    2    3  season    1    2    3
0      0  1991  winter  7.1  8.3  9.0  spring  0.5  7.2  1.5
1      1  1992  winter  4.2  5.1  8.2  spring  2.9  6.2  8.1

CodePudding user response：

Use columns.get_loc, which for duplicated labels will return a Boolean array, together with np.where to get the array positions of all duplicate instances of 'year' Then we can create a list of the array locations to keep and slice the DataFrame that way.

import numpy as np

to_drop = np.where(df.columns.get_loc('year'))[0][1:]
m = [x for x in np.arange(df.shape[1]) if x not in to_drop]
#[0, 1, 2, 3, 4, 5, 7, 8, 9, 10]

df.iloc[:, m]
   index  year  season    1    2    3  season    1    2    3
0      0  1991  winter  7.1  8.3  9.0  spring  0.5  7.2  1.5
1      1  1992  winter  4.2  5.1  8.2  spring  2.9  6.2  8.1