Home > Net >  Conditional wide to long format in Pandas
Conditional wide to long format in Pandas

Time:09-07

I have the following example dataset in wide format without any name for the date and value column

RIC Company 31/12/2021  31/12/2020  31/12/2019
ABC ABC ltd     0.77      0.75       0.66
XYZ XYZ ltd 7846.43       na         6607.29
CDF CDF ltd 191.14        127.74     63.92

I need to convert it to long format as the following

RIC Company     Date    Value
ABC ABC ltd 31/12/2021  0.77
ABC ABC ltd 31/12/2020  0.75
ABC ABC ltd 31/12/2019  0.66
XYZ XYZ ltd 31/12/2021  7846.43
XYZ XYZ ltd 31/12/2020  na
XYZ XYZ ltd 31/12/2019  6607.29
CDF CDF ltd 31/12/2021  191.14
CDF CDF ltd 31/12/2020  127.74
CDF CDF ltd 31/12/2019  63.92

I can't seem to get around the use of pd.melt /pd.wide_to_long/pyjanitor for the date section. Some sample code that I tried (using pyjanitor) but I get errors. Any help is appreciated. Thanks.

(df
.pivot_longer(
    index = ['RIC','Company'], 
    names_to = ('.value', 'Date'), 
    sort_by_appearance=True)
)

CodePudding user response:

You can just melt the dataframe providing the indices, and the variable columns along with the names for variable and values columns:

df.melt(['RIC', 'Company'],
        ['31/12/2021', '31/12/2020', '31/12/2019'],
        'Date',
        'Value')

OUTPUT:


   RIC  Company        Date    Value
0  ABC  ABC ltd  31/12/2021     0.77
1  XYZ  XYZ ltd  31/12/2021  7846.43
2  CDF  CDF ltd  31/12/2021   191.14
3  ABC  ABC ltd  31/12/2020     0.75
4  XYZ  XYZ ltd  31/12/2020       na
5  CDF  CDF ltd  31/12/2020   127.74
6  ABC  ABC ltd  31/12/2019     0.66
7  XYZ  XYZ ltd  31/12/2019  6607.29
8  CDF  CDF ltd  31/12/2019    63.92

If you don't want to manually type all the columns, you can just exclude the columns you don't want:

df.melt(['RIC', 'Company'],
        [col for col in df if col not in {'RIC', 'Comapny'}],
        'Date',
        'Value')
  • Related