Pivot values in specific order-CodePudding

I have a dataframe where I would like to pivot my data to fit a specific format, making sure the dates are consecutive.

Data

ID  Q122_c_en   Q122con_s   Q222_c_en   Q222con_s Q322_c_en Q322con_s Q422_c_en Q422con_s
AA  900         89          1000        90        1200      92        1000      90
BB  1000        10          1000        20        1100      25        1300      30

Desired

ID  Date    con_en  con_s   
AA  Q122    900     89  
AA  Q222    1000    90  
AA  Q322    1200    92  
AA  Q422    1000    90  
BB  Q122    1000    10  
BB  Q222    1000    20  
BB  Q322    1100    25  
BB  Q422    1300    30

Doing

df.pivot(index="ID", columns="Date", values=["con_en", "con_s"])

I am using pivot, however, the format does not reflect the desired above format. Any suggestion is appreciated.

CodePudding user response：

You can create a MultiIndex and reshape:

(df
 .set_index('ID')
 .pipe(lambda d: d.set_axis(pd.MultiIndex.from_arrays(zip(*d.columns.str.split(
                 r'(?<=Q\d{3})_?', 1))), axis=1))
 .stack(0)
 .reset_index()
 .rename(columns={'level_1': 'Date'})
)

This should give you the desired order, but in case of issues use: .sort_values(by=['ID', 'Date'])

output:

   ID  Date  c_en  con_s
0  AA  Q122   900     89
1  AA  Q222  1000     90
2  AA  Q322  1200     92
3  AA  Q422  1000     90
4  BB  Q122  1000     10
5  BB  Q222  1000     20
6  BB  Q322  1100     25
7  BB  Q422  1300     30

CodePudding user response：

Here is one way to do that

Melt tthe DF, then split the column name, and finally pivot it

df2=df.melt(id_vars='ID', var_name = 'level1')
df2[['Date', 'level3']] = df2['level1'].str.extract(r'(^.{4})(.*)', expand=True)
df2.pivot(index=['ID', 'Date'], columns=['level3'], values='value').reset_index()

level3  ID  Date    _c_en   con_s
0       AA  Q122    900     89
1       AA  Q222    1000    90
2       AA  Q322    1200    92
3       AA  Q422    1000    90
4       BB  Q122    1000    10
5       BB  Q222    1000    20
6       BB  Q322    1100    25
7       BB  Q422    1300    30

CodePudding user response：

You can start with melt then extract and pivot:

dfm = df.melt(id_vars=['ID'])
df_out = pd.concat([dfm, dfm['variable'].str.extract('^(\w\d\d\d)(.*)')], axis=1)[['ID','value',0,1]]  \
    .pivot(index=['ID',0], columns=[1]).reset_index()
    
df_out.columns = ['ID','Date','con_en','con_s']
print(df_out)

   ID  Date  con_en  con_s
0  AA  Q122     900     89
1  AA  Q222    1000     90
2  AA  Q322    1200     92
3  AA  Q422    1000     90
4  BB  Q122    1000     10
5  BB  Q222    1000     20
6  BB  Q322    1100     25
7  BB  Q422    1300     30

CodePudding user response：

one option, where you can do the transform in one step, is with pivot_longer from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor

df.pivot_longer(
    index = 'ID', 
    names_to = ('Date', '.value'), 
    names_pattern = r"(Q\d )_?(. )",
    sort_by_appearance = True)

   ID  Date  c_en  con_s
0  AA  Q122   900     89
1  AA  Q222  1000     90
2  AA  Q322  1200     92
3  AA  Q422  1000     90
4  BB  Q122  1000     10
5  BB  Q222  1000     20
6  BB  Q322  1100     25
7  BB  Q422  1300     30

Any label in the columns associated with .value stays as a header; this is determined by the groups in the regex in names_pattern.