Home > database >  pandas add index to column name
pandas add index to column name

Time:01-17

I have data frame with about 100 columns that repeat itself because the data is organized by weeks, it looks something like that:

hours hours clicks clicks days days minutes minutes
week 1 week 2 week 1 week 2 week 1 week 2 week 1 week 2
2 2 2 3 6 2 2 3
1 7 6 3 8 2 9 3

I would like the output to look like this:

hours_w1 hours_w2 clicks_w1 clicks_w2 days_w1 days_w2 minutes_w1 minutes_w2
2 2 2 3 6 2 2 3
1 7 6 3 8 2 9 3

I know I can just rename the columns but because I have over 100 columns I'm looking for a more efficient way.

I tried to use add_suffix but had only managed to add the same suffix to all columns, when what I need is a different index for each week.

any idea how to do this?

Thanks!!

CodePudding user response:

Extract the suffixes from the first row then add them to the column names and finally remove the first row.

# To fix mangle_dup_cols
df.columns = df.columns.str.split('.').str[0]
suffixes = '_'   df.iloc[0].str[0]   df.iloc[0].str[-1]
df.columns  = suffixes
df = df.iloc[1:]

Output:

>>> df
  hours_w1 hours_w2 clicks_w1 clicks_w2 days_w1 days_w2 minutes_w1 minutes_w2
1        2        2         2         3       6       2          2          3
2        1        7         6         3       8       2          9          3

CodePudding user response:

first you should change the first row:

df.iloc[0] = df.iloc[0].apply(lambda x:'w1' if x == 'week 1' else 'w2')

Then you can merge it with the column name like this:

df.columns = [i   '_'   j for i, j in zip(df.columns, df.iloc[0])]

And then you can remove the first row:

df = df.iloc[1:]
  • Related