Switch columns and row on a dataframe-CodePudding

I have a pandas dataframe, and I need to "switch" row and columns. (transpose ? pivot ? melt ? no idea how is it called) :

Let's say I have 2 references, each reference has 4 steps, each step has 3 variables

df = pd.DataFrame({'ref': ['ref1', 'ref1', 'ref1', 'ref1', 'ref2', 'ref2', 'ref2', 'ref2'],
                   'step': [1, 2, 3, 4, 1, 2, 3, 4],
                   'var_1': [5, 7, 7, 9, 12, 9, 87, 90],
                   'var_2': [11, 8, 10, 6, 6, 9, 12, 9],
                   "var_3": [11, 8, 10, 6, 6, 9, 12, 9]})

enter image description here

I want to transform this df to this form :

df = pd.DataFrame({'ref': ['ref1', 'ref2'],
>                    'step 1 - var_1': [5, 12],
>                    'step 1 - var_2': [11, 6],
>                    'step 1 - var_3': [11, 6],
>                    'step 2 - var_1': [7, 12],
>                    'step 2 - var_2': [8, 9],
>                    'step 2 - var_3': [8, 9]})

enter image description here

Thank you in advance, Best

edit : format and images

CodePudding user response：

Try this using the new "walrus" operator in a one-liner:

(df_new := df.set_index(['ref', 'step']).unstack().sort_index(level=1, axis=1))\
    .set_axis([f'step {j} - {i}' for i, j in df_new.columns], axis=1)

Output:

      step 1 - var_1  step 1 - var_2  step 1 - var_3  step 2 - var_1  step 2 - var_2  step 2 - var_3  step 3 - var_1  step 3 - var_2  step 3 - var_3  step 4 - var_1  step 4 - var_2  step 4 - var_3
ref                                                                                                                                                                                                 
ref1               5              11              11               7               8               8               7              10              10               9               6               6
ref2              12               6               6               9               9               9              87              12              12              90               9               9

Details:

set_index with columns ref and step
unstack step to move into columns
Order you columns uinsg sort_index
flatten multiindex dataframe column headers using set_axis and list comprehension with f-string formatting

Alternative way with same out put with order as above,

df_out = df.pivot(index='ref', columns='step').sort_index(level=1, axis=1)
df_out.columns = [f'step {j} - {i}' for i, j in df_out.columns]

CodePudding user response：

One option is with pivot_wider from pyjnanitor, to abstract the column renaming via the names_glue parameter:

# pip insall pyjanitor
import pandas as pd
import janitor
(df
.pivot_wider(
    index='ref', 
    names_from='step',  
    values_from = ['var_1', 'var_2', 'var_3'],
    names_glue = "step {step} - {_value}")
.sort_index(axis=1)
) 
    ref  step 1 - var_1  step 1 - var_2  step 1 - var_3  ...  step 3 - var_3  step 4 - var_1  step 4 - var_2  step 4 - var_3
0  ref1               5              11              11  ...              10               9               6               6
1  ref2              12               6               6  ...              12              90               9               9

[2 rows x 13 columns]

names_glue allows a combination of the values_from and names_from parameters - in the code above {step} is the names_from argument, while {_value} is a placeholder for values_from

If we stick strictly to your output, then a filter should be executed on the step column for only values less than 3:

(df
.loc[df.step < 3]
.pivot_wider(
    index='ref', 
    names_from='step', 
    names_glue = "step {step} - {_value}")
.sort_index(axis=1)
) 
    ref  step 1 - var_1  step 1 - var_2  step 1 - var_3  step 2 - var_1  step 2 - var_2  step 2 - var_3
0  ref1               5              11              11               7               8               8
1  ref2              12               6               6               9               9               9

CodePudding user response：

Use:

df[df.step<3].set_index(['ref','step']).unstack()

output:

     var_1    var_2    var_3   
step     1  2     1  2     1  2
ref                            
ref1     5  7    11  8    11  8
ref2    12  9     6  9     6  9

CodePudding user response：

The trick is to use df.pivot. You can choose the index='ref' and columns='step':

new = df.pivot(index='ref', columns='step')

The resulting table has a multi-index for ref and a multi-column. The multi-column has the step and var information, so we can use that to make the new column headers. Finally, we can reset the index:

new.columns = [f'step {x[1]} - {x[0]}' for x in new.columns]
new = new.reset_index()