Pivoting pandas dataframe without requiring the row index to be unique-CodePudding

I currently have a multi-dimensional index dataframe as follows:

		Placed	Placed	Placed	Attachment	Attachment	Attachment	Limit	Limit	Limit
Treaty Year	Currency	D&O	W&I/TRI	Cyber	D&O	W&I/TRI	Cyber	D&O	W&I/TRI	Cyber
2022	USD	0.25	0.25	0.25	5000000	5000000	5000000	20000000	20000000	20000000
2022	EUR	0.25	0.25	0.25	5000000	5000000	5000000	20000000	20000000	20000000
2022	GBP	0.25	0.25	0.25	5000000	5000000	5000000	10000000	10000000	10000000
2022	CAD	0.25	0.25	0.25	7000000	7000000	7000000	28000000	28000000	28000000
2022	AUD	0.25	0.25	0.25	6000000	6000000	6000000	24000000	24000000	24000000
2022	USD	0.12	0.12	0.12	400000	400000	400000	10000000	10000000	10000000
2022	EUR	0.12	0.12	0.12	400000	400000	400000	10000000	10000000	10000000
2022	GBP	0.12	0.12	0.12	400000	400000	400000	10000000	10000000	10000000
2022	CAD	0.12	0.12	0.12	400000	400000	400000	10000000	10000000	10000000
2022	AUD	0.12	0.12	0.12	400000	400000	400000	10000000	10000000	10000000

And I am looking for the following:

Treaty Year	Currency	Class	Placed	Attachment	Limit
2022	USD	D&O	0.25	5000000	20000000
2022	EUR	D&O	0.25	5000000	20000000
2022	GBP	D&O	0.25	5000000	10000000
2022	CAD	D&O	0.25	7000000	28000000
2022	AUD	D&O	0.25	6000000	24000000
2022	USD	D&O	0.12	400000	10000000
2022	EUR	D&O	0.12	400000	10000000
2022	GBP	D&O	0.12	400000	10000000
2022	CAD	D&O	0.12	400000	10000000
2022	AUD	D&O	0.12	400000	10000000
2022	USD	W&I/TRI	0.12	400000	10000000
2022	EUR	W&I/TRI	0.12	400000	10000000
2022	GBP	W&I/TRI	0.12	400000	10000000
2022	CAD	W&I/TRI	0.12	400000	10000000

I have tried the following code:

from RI import XOL
import pandas as pd
import openpyxl
import xlwings as xw


claims_data_path = "P:/WIP/Net Down/Claim Details Net (97).xlsx"
contract_details_path = "P:/WIP/Net Down/FinLinesContractDetails.xlsx"

# load the Financial Lines claims data as well as RI contract data
claims_data = pd.DataFrame(xw.Book(claims_data_path).sheets["Claim Details Net"].range("A3:BE34964").value)
contract_data = pd.read_excel(contract_details_path, header=[0,1])
# slice the contract data into something useful
t = pd.melt(contract_data, id_vars=[("Contract Index", "Contract Index"),
                                    ("Treaty Year", "Treaty Year"),
                                    ("Currency", "Currency")],
            value_vars=contract_data.columns.tolist())

contract_data_shaped = pd.pivot_table(t,
                                      values="value",
                                      index=[("Contract Index", "Contract Index"), ("Treaty Year", "Treaty Year"),
                                             ("Currency", "Currency"), "variable_1"],
                                      columns="variable_0")\
    .reset_index()
# rename contract data columns
contract_data_shaped.columns = ["Contract Index", "Treaty Year", "Currency", "Class", "Attachment", "Limit", "Placed"]

Which results in the following result:

	Contract Index	Treaty Year	Currency	Class	Attachment	Limit	Placed
282	1	2022	GBP	Cyber	5000000	10000000	0.25
283	1	2022	GBP	D&O	5000000	10000000	0.25
284	1	2022	GBP	W&I/TRI	5000000	10000000	0.25
285	1	2022	USD	Cyber	5000000	20000000	0.25
286	1	2022	USD	D&O	5000000	20000000	0.25
287	1	2022	USD	W&I/TRI	5000000	20000000	0.25
570	2	2022	GBP	Cyber	0	0	0
571	2	2022	GBP	D&O	15000000	10000000	0.5
572	2	2022	GBP	W&I/TRI	15000000	15000000	0.5
573	2	2022	USD	Cyber	0	0	0
574	2	2022	USD	D&O	25000000	15000000	0.5
575	2	2022	USD	W&I/TRI	25000000	25000000	0.5

Note the made up and imported an extra column "Contract Index" [not seen in top most table examples above] to get around the fact that the combo of Treaty Year / Class / Currency is not unique - the amounts in the pivot are. I don't really want to use this completely fabricated column ideally - would somebody please give me a hint as to how I can retain this duplication in the widened dataframe without the need to creating these helper columns please? Is pivot_table the correct thing to use here? Cheers, Alun

CodePudding user response：

If I understand you right you can use simple .stack with level=1:

print(
    df.stack(level=1)
    .reset_index()
    .rename(columns={"level_2": "Class"})
    .sort_values(["Treaty Year", "Class", "Class"])
)

Prints:

    Treaty Year Currency    Class  Attachment     Limit  Placed
0          2022      USD    Cyber     5000000  20000000    0.25
3          2022      EUR    Cyber     5000000  20000000    0.25
6          2022      GBP    Cyber     5000000  10000000    0.25
9          2022      CAD    Cyber     7000000  28000000    0.25
12         2022      AUD    Cyber     6000000  24000000    0.25
15         2022      USD    Cyber      400000  10000000    0.12
18         2022      EUR    Cyber      400000  10000000    0.12
21         2022      GBP    Cyber      400000  10000000    0.12
24         2022      CAD    Cyber      400000  10000000    0.12
27         2022      AUD    Cyber      400000  10000000    0.12
1          2022      USD      D&O     5000000  20000000    0.25
4          2022      EUR      D&O     5000000  20000000    0.25
7          2022      GBP      D&O     5000000  10000000    0.25
10         2022      CAD      D&O     7000000  28000000    0.25
13         2022      AUD      D&O     6000000  24000000    0.25
16         2022      USD      D&O      400000  10000000    0.12
19         2022      EUR      D&O      400000  10000000    0.12
22         2022      GBP      D&O      400000  10000000    0.12
25         2022      CAD      D&O      400000  10000000    0.12
28         2022      AUD      D&O      400000  10000000    0.12
2          2022      USD  W&I/TRI     5000000  20000000    0.25
5          2022      EUR  W&I/TRI     5000000  20000000    0.25
8          2022      GBP  W&I/TRI     5000000  10000000    0.25
11         2022      CAD  W&I/TRI     7000000  28000000    0.25
14         2022      AUD  W&I/TRI     6000000  24000000    0.25
17         2022      USD  W&I/TRI      400000  10000000    0.12
20         2022      EUR  W&I/TRI      400000  10000000    0.12
23         2022      GBP  W&I/TRI      400000  10000000    0.12
26         2022      CAD  W&I/TRI      400000  10000000    0.12
29         2022      AUD  W&I/TRI      400000  10000000    0.12

Dataframe used:

                     Placed               Attachment                       Limit                    
                        D&O W&I/TRI Cyber        D&O  W&I/TRI    Cyber       D&O   W&I/TRI     Cyber
Treaty Year Currency                                                                                
2022        USD        0.25    0.25  0.25    5000000  5000000  5000000  20000000  20000000  20000000
            EUR        0.25    0.25  0.25    5000000  5000000  5000000  20000000  20000000  20000000
            GBP        0.25    0.25  0.25    5000000  5000000  5000000  10000000  10000000  10000000
            CAD        0.25    0.25  0.25    7000000  7000000  7000000  28000000  28000000  28000000
            AUD        0.25    0.25  0.25    6000000  6000000  6000000  24000000  24000000  24000000
            USD        0.12    0.12  0.12     400000   400000   400000  10000000  10000000  10000000
            EUR        0.12    0.12  0.12     400000   400000   400000  10000000  10000000  10000000
            GBP        0.12    0.12  0.12     400000   400000   400000  10000000  10000000  10000000
            CAD        0.12    0.12  0.12     400000   400000   400000  10000000  10000000  10000000
            AUD        0.12    0.12  0.12     400000   400000   400000  10000000  10000000  10000000