I have a pandas DataFrame in the following long format, where each observation is the expenditure on a given date, for each port, for each transport type.
import pandas as pd
import io
df_s = """freight_type; port; 2020-01-01; 2020-01-02; 2020-01-03; 2020-01-04; 2020-01-05
rail; qd; 51204; 1024151; 100510; 715011; 1004151
vessel; qd; 81294; 1024151; 100510; 715011; 1004151
truck; qd; 59434; 756151; 100510; 715011; 1004151
rail; ts; 199952; 521151; 100510; 715011; 1004151
vessel; ts; 515254; 224151; 100510; 715011; 1004151
truck; ts; 512552; 104151; 100510; 715011; 1004151
"""
df = pd.read_csv(io.StringIO(df_s), sep="; ")
df.head(6)
>>>
freight_type port 2020-01-01 2020-01-02 2020-01-03 2020-01-04 2020-01-05
0 rail qd 51204 1024151 100510 715011 1004151
1 vessel qd 81294 1024151 100510 715011 1004151
2 truck qd 59434 756151 100510 715011 1004151
3 rail ts 199952 521151 100510 715011 1004151
4 vessel ts 515254 224151 100510 715011 1004151
5 truck ts 512552 104151 100510 715011 1004151
I would like to filter by port and then transpose the dataset in a way that I can set the date as an index, and create an individual column for each transport_type
column.
Desired output:
port_qd_df:
date rail_exp vessel_exp truck_exp
'2020-01-01' 51204 81294 59434
'2020-01-01' 1024151 1024151 756151
'2020-01-01' 100510 100510 100510
I have tried using pd.wide_to_long
, however, I am struggling with its implementation/documentation as to how to get it to reach my solution.
Any help very greatly appreciated.
CodePudding user response:
You can create a transposed pivot table:
df_pivot = pd.pivot_table(df, index=['port','freight_type']).T
Result:
port qd ts
freight_type rail truck vessel rail truck vessel
2020-01-01 51204 59434 81294 199952 512552 515254
2020-01-02 1024151 756151 1024151 521151 104151 224151
2020-01-03 100510 100510 100510 100510 100510 100510
2020-01-04 715011 715011 715011 715011 715011 715011
2020-01-05 1004151 1004151 1004151 1004151 1004151 1004151
You can then extract separate dataframes by taking a cross-section:
port_qd_df = df_pivot.xs('qd', axis=1)