How to reshape a dataframe and extend the number of rows-CodePudding

I'm trying to make this kind of transformations :

I tried so many reshape functions but still not getting the expected output.

Do you have any ideas, please ? If needed, here is an example :

import pandas as pd

df = pd.DataFrame({'A': ['ID_X', 'A01', 'Green', 'A02', 'Cerise', 'ID_Y', 'A01', 'Azure', 'A02', 'Black'],
                   'B': ['ID_X', 'B01', 'Red', 'B02', 'Celeste', 'ID_Y', 'B01', 'Beige', 'B02', 'Rose'],
                   'C': ['ID_X', 'C01', 'Brown', 'C02', 'Orange', 'ID_Y', 'C01', 'Canary', 'C02', 'White'],
                   'TYPE': ['ID', 'POSITION', 'COLOR', 'POSITION', 'COLOR', 'ID', 'POSITION', 'COLOR',
                            'POSITION', 'COLOR']})

CodePudding user response：

# Cumcount to mark your different groups
df['column'] = df[df.TYPE.eq('ID')].groupby('TYPE').cumcount()
df.column = df.column.ffill()

# pivot_table, transpose, explode different levels, and reset the index:
out = (df.pivot_table(index='TYPE', columns='column', aggfunc=list)
         .T
         .explode('ID')
         .explode(['COLOR', 'POSITION'])
         .reset_index(drop=True))
print(out)

Output:

TYPE    COLOR    ID POSITION
0       Green  ID_X      A01
1      Cerise  ID_X      A02
2       Azure  ID_Y      A01
3       Black  ID_Y      A02
4         Red  ID_X      B01
5     Celeste  ID_X      B02
6       Beige  ID_Y      B01
7        Rose  ID_Y      B02
8       Brown  ID_X      C01
9      Orange  ID_X      C02
10     Canary  ID_Y      C01
11      White  ID_Y      C02

CodePudding user response：

Another way is by following the below code:

import pandas as pd

d1 = {'A': ['ID_X', 'A01', 'Green', 'ID_Y', 'A01', 'Yellow'],
 'B': ['ID_X', 'B01', 'Red', 'ID_Y', 'B01', 'Blue'],
 'C': ['ID_X', 'C01', 'Brown', 'ID_Y', 'C01', 'Purple'],
 'TYPE': ['ID', 'POSITION', 'COLOR', 'ID', 'POSITION', 'COLOR']}

df = pd.DataFrame(d1)
df1 = pd.DataFrame(columns=df.TYPE.unique())
for i in range (0, df.shape[0]-1, 3):
    temp = df.iloc[i:i 3,:-1].T
    temp.columns = df1.columns
    df1 = pd.concat([df1,temp])