Home > front end >  Convert dataframe rows to columns with repeated index values
Convert dataframe rows to columns with repeated index values

Time:11-14

I have a pandas DataFrame in the following long format, where each observation is the expenditure on a given date, for each port, for each transport type.

import pandas as pd
import io  

df_s = """freight_type; port; 2020-01-01; 2020-01-02; 2020-01-03; 2020-01-04; 2020-01-05
rail; qd; 51204; 1024151; 100510; 715011; 1004151
vessel; qd; 81294; 1024151; 100510; 715011; 1004151
truck; qd; 59434; 756151; 100510; 715011; 1004151 
rail; ts; 199952; 521151; 100510; 715011; 1004151
vessel; ts; 515254; 224151; 100510; 715011; 1004151
truck; ts; 512552; 104151; 100510; 715011; 1004151
"""

df = pd.read_csv(io.StringIO(df_s), sep="; ")
df.head(6)
>>> 
freight_type port 2020-01-01 2020-01-02 2020-01-03 2020-01-04 2020-01-05
0   rail     qd   51204      1024151    100510     715011     1004151
1   vessel   qd   81294      1024151    100510     715011     1004151
2   truck    qd   59434      756151     100510     715011     1004151
3   rail     ts   199952     521151     100510     715011     1004151
4   vessel   ts   515254     224151     100510     715011     1004151
5   truck    ts   512552     104151     100510     715011     1004151

I would like to filter by port and then transpose the dataset in a way that I can set the date as an index, and create an individual column for each transport_type column.

Desired output:

port_qd_df:

date         rail_exp   vessel_exp  truck_exp
'2020-01-01'   51204    81294       59434       
'2020-01-01'   1024151  1024151     756151  
'2020-01-01'   100510   100510      100510

I have tried using pd.wide_to_long, however, I am struggling with its implementation/documentation as to how to get it to reach my solution.

Any help very greatly appreciated.

CodePudding user response:

You can create a transposed pivot table:

df_pivot = pd.pivot_table(df, index=['port','freight_type']).T

Result:

port               qd                         ts                  
freight_type     rail    truck   vessel     rail    truck   vessel
2020-01-01      51204    59434    81294   199952   512552   515254
2020-01-02    1024151   756151  1024151   521151   104151   224151
2020-01-03     100510   100510   100510   100510   100510   100510
2020-01-04     715011   715011   715011   715011   715011   715011
2020-01-05    1004151  1004151  1004151  1004151  1004151  1004151

You can then extract separate dataframes by taking a cross-section:

port_qd_df = df_pivot.xs('qd', axis=1)
  • Related