I cannot find the answer searching online. Suppose I have a df1 that looks like this with the following column names and I cannot just use a drop on all duplicate col names as most of the online answers suggest.
index year season 1 2 3 year season 1 2 3
0 1991 winter 7.1 8.3 9.0 1991 spring 0.5 7.2 1.5
1 1992 winter 4.2 5.1 8.2 1991 spring 2.9 6.2 8.1
However, I need the column names to look like this by dropping/deleting subsequent column names of "year" but keeping the duplicate col names of "1,2,3" and 'season' so that the final new df2 looks like this:
index year season 1 2 3 season 1 2 3
0 1991 winter 7.1 8.3 9.0 spring 0.5 7.2 1.5
1 1992 winter 4.2 5.1 8.2 spring 2.9 6.2 8.1
Thank you,
CodePudding user response:
In your case do
df.loc[:,(df.groupby(level=0,axis=1).cumcount().eq(0)) | (df.columns!='year')]
Out[188]:
index year season 1 2 3 season 1 2 3
0 0 1991 winter 7.1 8.3 9.0 spring 0.5 7.2 1.5
1 1 1992 winter 4.2 5.1 8.2 spring 2.9 6.2 8.1
CodePudding user response:
Use columns.get_loc
, which for duplicated labels will return a Boolean array, together with np.where
to get the array positions of all duplicate instances of 'year'
Then we can create a list of the array locations to keep and slice the DataFrame that way.
import numpy as np
to_drop = np.where(df.columns.get_loc('year'))[0][1:]
m = [x for x in np.arange(df.shape[1]) if x not in to_drop]
#[0, 1, 2, 3, 4, 5, 7, 8, 9, 10]
df.iloc[:, m]
index year season 1 2 3 season 1 2 3
0 0 1991 winter 7.1 8.3 9.0 spring 0.5 7.2 1.5
1 1 1992 winter 4.2 5.1 8.2 spring 2.9 6.2 8.1