I have the following example dataset in wide format without any name for the date and value column
RIC Company 31/12/2021 31/12/2020 31/12/2019
ABC ABC ltd 0.77 0.75 0.66
XYZ XYZ ltd 7846.43 na 6607.29
CDF CDF ltd 191.14 127.74 63.92
I need to convert it to long format as the following
RIC Company Date Value
ABC ABC ltd 31/12/2021 0.77
ABC ABC ltd 31/12/2020 0.75
ABC ABC ltd 31/12/2019 0.66
XYZ XYZ ltd 31/12/2021 7846.43
XYZ XYZ ltd 31/12/2020 na
XYZ XYZ ltd 31/12/2019 6607.29
CDF CDF ltd 31/12/2021 191.14
CDF CDF ltd 31/12/2020 127.74
CDF CDF ltd 31/12/2019 63.92
I can't seem to get around the use of pd.melt /pd.wide_to_long/pyjanitor for the date section. Some sample code that I tried (using pyjanitor) but I get errors. Any help is appreciated. Thanks.
(df
.pivot_longer(
index = ['RIC','Company'],
names_to = ('.value', 'Date'),
sort_by_appearance=True)
)
CodePudding user response:
You can just melt the dataframe providing the indices, and the variable columns along with the names for variable and values columns:
df.melt(['RIC', 'Company'],
['31/12/2021', '31/12/2020', '31/12/2019'],
'Date',
'Value')
OUTPUT:
RIC Company Date Value
0 ABC ABC ltd 31/12/2021 0.77
1 XYZ XYZ ltd 31/12/2021 7846.43
2 CDF CDF ltd 31/12/2021 191.14
3 ABC ABC ltd 31/12/2020 0.75
4 XYZ XYZ ltd 31/12/2020 na
5 CDF CDF ltd 31/12/2020 127.74
6 ABC ABC ltd 31/12/2019 0.66
7 XYZ XYZ ltd 31/12/2019 6607.29
8 CDF CDF ltd 31/12/2019 63.92
If you don't want to manually type all the columns, you can just exclude the columns you don't want:
df.melt(['RIC', 'Company'],
[col for col in df if col not in {'RIC', 'Comapny'}],
'Date',
'Value')