Home > Enterprise >  Pivoting pandas dataframe without requiring the row index to be unique
Pivoting pandas dataframe without requiring the row index to be unique

Time:08-06

I currently have a multi-dimensional index dataframe as follows:

Placed Placed Placed Attachment Attachment Attachment Limit Limit Limit
Treaty Year Currency D&O W&I/TRI Cyber D&O W&I/TRI Cyber D&O W&I/TRI Cyber
2022 USD 0.25 0.25 0.25 5000000 5000000 5000000 20000000 20000000 20000000
2022 EUR 0.25 0.25 0.25 5000000 5000000 5000000 20000000 20000000 20000000
2022 GBP 0.25 0.25 0.25 5000000 5000000 5000000 10000000 10000000 10000000
2022 CAD 0.25 0.25 0.25 7000000 7000000 7000000 28000000 28000000 28000000
2022 AUD 0.25 0.25 0.25 6000000 6000000 6000000 24000000 24000000 24000000
2022 USD 0.12 0.12 0.12 400000 400000 400000 10000000 10000000 10000000
2022 EUR 0.12 0.12 0.12 400000 400000 400000 10000000 10000000 10000000
2022 GBP 0.12 0.12 0.12 400000 400000 400000 10000000 10000000 10000000
2022 CAD 0.12 0.12 0.12 400000 400000 400000 10000000 10000000 10000000
2022 AUD 0.12 0.12 0.12 400000 400000 400000 10000000 10000000 10000000

And I am looking for the following:

Treaty Year Currency Class Placed Attachment Limit
2022 USD D&O 0.25 5000000 20000000
2022 EUR D&O 0.25 5000000 20000000
2022 GBP D&O 0.25 5000000 10000000
2022 CAD D&O 0.25 7000000 28000000
2022 AUD D&O 0.25 6000000 24000000
2022 USD D&O 0.12 400000 10000000
2022 EUR D&O 0.12 400000 10000000
2022 GBP D&O 0.12 400000 10000000
2022 CAD D&O 0.12 400000 10000000
2022 AUD D&O 0.12 400000 10000000
2022 USD W&I/TRI 0.12 400000 10000000
2022 EUR W&I/TRI 0.12 400000 10000000
2022 GBP W&I/TRI 0.12 400000 10000000
2022 CAD W&I/TRI 0.12 400000 10000000

I have tried the following code:

from RI import XOL
import pandas as pd
import openpyxl
import xlwings as xw


claims_data_path = "P:/WIP/Net Down/Claim Details Net (97).xlsx"
contract_details_path = "P:/WIP/Net Down/FinLinesContractDetails.xlsx"

# load the Financial Lines claims data as well as RI contract data
claims_data = pd.DataFrame(xw.Book(claims_data_path).sheets["Claim Details Net"].range("A3:BE34964").value)
contract_data = pd.read_excel(contract_details_path, header=[0,1])
# slice the contract data into something useful
t = pd.melt(contract_data, id_vars=[("Contract Index", "Contract Index"),
                                    ("Treaty Year", "Treaty Year"),
                                    ("Currency", "Currency")],
            value_vars=contract_data.columns.tolist())

contract_data_shaped = pd.pivot_table(t,
                                      values="value",
                                      index=[("Contract Index", "Contract Index"), ("Treaty Year", "Treaty Year"),
                                             ("Currency", "Currency"), "variable_1"],
                                      columns="variable_0")\
    .reset_index()
# rename contract data columns
contract_data_shaped.columns = ["Contract Index", "Treaty Year", "Currency", "Class", "Attachment", "Limit", "Placed"]

Which results in the following result:

Contract Index Treaty Year Currency Class Attachment Limit Placed
282 1 2022 GBP Cyber 5000000 10000000 0.25
283 1 2022 GBP D&O 5000000 10000000 0.25
284 1 2022 GBP W&I/TRI 5000000 10000000 0.25
285 1 2022 USD Cyber 5000000 20000000 0.25
286 1 2022 USD D&O 5000000 20000000 0.25
287 1 2022 USD W&I/TRI 5000000 20000000 0.25
570 2 2022 GBP Cyber 0 0 0
571 2 2022 GBP D&O 15000000 10000000 0.5
572 2 2022 GBP W&I/TRI 15000000 15000000 0.5
573 2 2022 USD Cyber 0 0 0
574 2 2022 USD D&O 25000000 15000000 0.5
575 2 2022 USD W&I/TRI 25000000 25000000 0.5

Note the made up and imported an extra column "Contract Index" [not seen in top most table examples above] to get around the fact that the combo of Treaty Year / Class / Currency is not unique - the amounts in the pivot are. I don't really want to use this completely fabricated column ideally - would somebody please give me a hint as to how I can retain this duplication in the widened dataframe without the need to creating these helper columns please? Is pivot_table the correct thing to use here? Cheers, Alun

CodePudding user response:

If I understand you right you can use simple .stack with level=1:

print(
    df.stack(level=1)
    .reset_index()
    .rename(columns={"level_2": "Class"})
    .sort_values(["Treaty Year", "Class", "Class"])
)

Prints:

    Treaty Year Currency    Class  Attachment     Limit  Placed
0          2022      USD    Cyber     5000000  20000000    0.25
3          2022      EUR    Cyber     5000000  20000000    0.25
6          2022      GBP    Cyber     5000000  10000000    0.25
9          2022      CAD    Cyber     7000000  28000000    0.25
12         2022      AUD    Cyber     6000000  24000000    0.25
15         2022      USD    Cyber      400000  10000000    0.12
18         2022      EUR    Cyber      400000  10000000    0.12
21         2022      GBP    Cyber      400000  10000000    0.12
24         2022      CAD    Cyber      400000  10000000    0.12
27         2022      AUD    Cyber      400000  10000000    0.12
1          2022      USD      D&O     5000000  20000000    0.25
4          2022      EUR      D&O     5000000  20000000    0.25
7          2022      GBP      D&O     5000000  10000000    0.25
10         2022      CAD      D&O     7000000  28000000    0.25
13         2022      AUD      D&O     6000000  24000000    0.25
16         2022      USD      D&O      400000  10000000    0.12
19         2022      EUR      D&O      400000  10000000    0.12
22         2022      GBP      D&O      400000  10000000    0.12
25         2022      CAD      D&O      400000  10000000    0.12
28         2022      AUD      D&O      400000  10000000    0.12
2          2022      USD  W&I/TRI     5000000  20000000    0.25
5          2022      EUR  W&I/TRI     5000000  20000000    0.25
8          2022      GBP  W&I/TRI     5000000  10000000    0.25
11         2022      CAD  W&I/TRI     7000000  28000000    0.25
14         2022      AUD  W&I/TRI     6000000  24000000    0.25
17         2022      USD  W&I/TRI      400000  10000000    0.12
20         2022      EUR  W&I/TRI      400000  10000000    0.12
23         2022      GBP  W&I/TRI      400000  10000000    0.12
26         2022      CAD  W&I/TRI      400000  10000000    0.12
29         2022      AUD  W&I/TRI      400000  10000000    0.12

Dataframe used:

                     Placed               Attachment                       Limit                    
                        D&O W&I/TRI Cyber        D&O  W&I/TRI    Cyber       D&O   W&I/TRI     Cyber
Treaty Year Currency                                                                                
2022        USD        0.25    0.25  0.25    5000000  5000000  5000000  20000000  20000000  20000000
            EUR        0.25    0.25  0.25    5000000  5000000  5000000  20000000  20000000  20000000
            GBP        0.25    0.25  0.25    5000000  5000000  5000000  10000000  10000000  10000000
            CAD        0.25    0.25  0.25    7000000  7000000  7000000  28000000  28000000  28000000
            AUD        0.25    0.25  0.25    6000000  6000000  6000000  24000000  24000000  24000000
            USD        0.12    0.12  0.12     400000   400000   400000  10000000  10000000  10000000
            EUR        0.12    0.12  0.12     400000   400000   400000  10000000  10000000  10000000
            GBP        0.12    0.12  0.12     400000   400000   400000  10000000  10000000  10000000
            CAD        0.12    0.12  0.12     400000   400000   400000  10000000  10000000  10000000
            AUD        0.12    0.12  0.12     400000   400000   400000  10000000  10000000  10000000
  • Related