Home > Net >  Reverse row and column transformation within Python
Reverse row and column transformation within Python

Time:12-07

I have a dataframe where I would like to maintain all columns in my original dataset and create a new pivoted column based on existing dataset.

Data

 stat1 stat2    id  q122con q122av  q122con q122av  q222con q222av  q222con q222av
 50    1000     aa  40      10      900     100     50      0       1000    0   
 100   2000     bb  50      50      1500    500     75      25      1900    100 
                

                        
                                        

Desired

stat1   stat2   id  date    con         av          con         av      
50      1000    aa  q122    40          10          900         100             
50      1000    aa  q222    50          0           1000        0               
100     2000    bb  q122    50          50          1500        500             
100     2000    bb  q222    75          25          1900        100     

    

Doing

df.pivot(index="id", columns="date", values=["con", "av"])

However, I am not obtaining the full columns within my dataset. Any suggestion is appreciated.

CodePudding user response:

Some long one line by wide_to_long

pd.wide_to_long(df.set_index(['stat1','stat2','id']).stack().groupby(level=[0,1,2,3]).agg(list).apply(pd.Series).unstack().stack(level=0).reset_index(),
                stubnames = ['q122','q222'], i = ['stat1','stat2','id','level_3'],j = 'date',suffix='\\w ').stack().unstack(level=[-3,-2])
Out[140]: 
level_3               0        1      
date                 av con   av   con
stat1 stat2 id                        
50    1000  aa q122  10  40  100   900
               q222   0  50    0  1000
100   2000  bb q122  50  50  500  1500
               q222  25  75  100  1900

CodePudding user response:

Lot of your issues here is dealing with duplicate column names:

import pandas as pd

# Duplicating input dataframe with clipboard and remove dot numbers assign for duplicate column headers
df = pd.read_clipboard()
df.columns = df.columns.str.split('.').str[0]


# Set index to move first three columns into index
df = df.set_index(['stat1','stat2','id'])

# Use groupby and cumcount to get order of duplicate column headers
cols = df.groupby(df.columns, axis=1).cumcount().rename('No').reset_index()

# Use str.extract to split "dates" from av and con with regex
cols = cols['index'].str.extract('(q\d{3})(.*)').join(cols).drop('index', axis=1)

# Create a new multiIndex column header
df.columns = pd.MultiIndex.from_frame(cols, names=['date','av','con'])

# Reshape dataframe by stacking the outer most column header to the dataframe index
# And moved those columns from the index back into the dataframe with reset_index
df_out =  df.stack(0).reset_index()

# Flatten headers back to one level
df_out.columns = [f'{i}_{j}' if j else f'{i}' for i, j in df_out.columns]

# And print
print(df_out)

Output:

   stat1  stat2  id  date  av  av_1  con  con_1
0     50   1000  aa  q122  10   100   40    900
1     50   1000  aa  q222   0     0   50   1000
2    100   2000  bb  q122  50   500   50   1500
3    100   2000  bb  q222  25   100   75   1900
  • Related