pandas add index to column name-CodePudding

I have data frame with about 100 columns that repeat itself because the data is organized by weeks, it looks something like that:

hours	hours	clicks	clicks	days	days	minutes	minutes
week 1	week 2	week 1	week 2	week 1	week 2	week 1	week 2
2	2	2	3	6	2	2	3
1	7	6	3	8	2	9	3

I would like the output to look like this:

hours_w1	hours_w2	clicks_w1	clicks_w2	days_w1	days_w2	minutes_w1	minutes_w2
2	2	2	3	6	2	2	3
1	7	6	3	8	2	9	3

I know I can just rename the columns but because I have over 100 columns I'm looking for a more efficient way.

I tried to use add_suffix but had only managed to add the same suffix to all columns, when what I need is a different index for each week.

any idea how to do this?

Thanks!!

CodePudding user response：

Extract the suffixes from the first row then add them to the column names and finally remove the first row.

# To fix mangle_dup_cols
df.columns = df.columns.str.split('.').str[0]
suffixes = '_'   df.iloc[0].str[0]   df.iloc[0].str[-1]
df.columns  = suffixes
df = df.iloc[1:]

Output:

>>> df
  hours_w1 hours_w2 clicks_w1 clicks_w2 days_w1 days_w2 minutes_w1 minutes_w2
1        2        2         2         3       6       2          2          3
2        1        7         6         3       8       2          9          3

CodePudding user response：

first you should change the first row:

df.iloc[0] = df.iloc[0].apply(lambda x:'w1' if x == 'week 1' else 'w2')

Then you can merge it with the column name like this:

df.columns = [i   '_'   j for i, j in zip(df.columns, df.iloc[0])]

And then you can remove the first row:

df = df.iloc[1:]