Home > other >  Transposing column to rows with "case-when" type of logic in pandas
Transposing column to rows with "case-when" type of logic in pandas

Time:10-03

I have this data frame:

data = {'DATE':['2021-01-01','2021-01-02','2021-01-03','2021-01-04','2021-01-05'],
        'BLDG_A ELEC':[1000, 2000, 3000, 4000, 5000],
        'BLDG_A STM':[100, 200, 300, 400, 500],
        'BLDG_A CHW':[10, 20, 30, 40, 50],
        'BLDG_B ELEC':[1100, 2100, 3100, 4100, 5100],
        'BLDG_B STM':[110, 210, 310, 410, 510],
        'BLDG_B CHW':[15, 25, 35, 45, 5]}

I need to transpose it into this format:

data2 = {'DATE':['2021-01-01','2021-01-01','2021-01-02','2021-01-02'],
        'BLDG':['BLDG_A', 'BLDG_B', 'BLDG_A', 'BLDG_B'],
        'ELEC': [1000,1100,2000,2100],
        'STM': [100,110,200,210],
        'CHW': [10,15,20,25]}

I am able to transform it manually by splitting the data frame into several data frames (ELEC, STM, CHW), renaming the values, and then merging them back again. The logic here is to apply a "case-when" condition to rename all the column names into just the building names. Here's my code so far:

import numpy as np

# create a data frame for elec, and transpose it
df_elec = df[['DATE', 'BLDG_A ELEC', 'BLDG_B ELEC']]
df_elec = df_elec.set_index('DATE').stack().reset_index(name='ELEC').rename(columns={'level_1':'BLDG'})

df_elec['BLDG'] = np.where((df_elec['BLDG'] == 'BLDG_A ELEC'),'BLDG_A',df_elec.BLDG)
df_elec['BLDG'] = np.where((df_elec['BLDG'] == 'BLDG_B ELEC'),'BLDG_B',df_elec.BLDG)

# create a data frame for stm, and transpose it
df_stm = df[['DATE', 'BLDG_A STM', 'BLDG_B STM']]
df_stm = df_stm.set_index('DATE').stack().reset_index(name='STM').rename(columns={'level_1':'BLDG'})

df_stm['BLDG'] = np.where((df_stm['BLDG'] == 'BLDG_A STM'),'BLDG_A',df_stm.BLDG)
df_stm['BLDG'] = np.where((df_stm['BLDG'] == 'BLDG_B STM'),'BLDG_B',df_stm.BLDG)

# create a data frame for chw, and transpose it
df_chw = df[['DATE', 'BLDG_A CHW', 'BLDG_B CHW']]
df_chw = df_chw.set_index('DATE').stack().reset_index(name='CHW').rename(columns={'level_1':'BLDG'})

df_chw['BLDG'] = np.where((df_chw['BLDG'] == 'BLDG_A CHW'),'BLDG_A',df_chw.BLDG)
df_chw['BLDG'] = np.where((df_chw['BLDG'] == 'BLDG_B CHW'),'BLDG_B',df_chw.BLDG)

# merge all the data frames
df_merge = pd.merge(df_elec, df_stm, how='left', on=['DATE','BLDG'])
df_merge = pd.merge(df_merge, df_chw, how='left', on=['DATE','BLDG'])
df_merge

Are there other more straightforward approaches to this? Additionally, this code is not scalable once we added more building columns. Thanks in advance, Stack Overflow!

CodePudding user response:

Here is one way to do it


# melt the dataframe keeping date as its own column 'DATE
df2=df.melt(id_vars=['DATE'])

# split the melted column name at space into two columns
df2[['BLDG', 'col']]=df2['variable'].str.split(' ', expand=True)#.drop(columns='variable')
 
#pivot to reformat the result into desire format
df2.pivot(index=['DATE','BLDG'], columns='col', values='value' ).reset_index().rename_axis(columns=None)
         DATE   BLDG    CHW     ELEC    STM
0   2021-01-01  BLDG_A   10     1000    100
1   2021-01-01  BLDG_B   15     1100    110
2   2021-01-02  BLDG_A   20     2000    200
3   2021-01-02  BLDG_B   25     2100    210
4   2021-01-03  BLDG_A   30     3000    300
5   2021-01-03  BLDG_B   35     3100    310
6   2021-01-04  BLDG_A   40     4000    400
7   2021-01-04  BLDG_B   45     4100    410
8   2021-01-05  BLDG_A   50     5000    500
9   2021-01-05  BLDG_B    5     5100    510

CodePudding user response:

one option is with pivot_longer from pyjanitor:

# pip install pyjanitor
import janitor
import pandas as pd

(df
.pivot_longer(
    index = 'DATE', 
    names_to = ('BLDG', '.value'), 
    names_sep = ' ', 
    sort_by_appearance=True)
)
         DATE    BLDG  ELEC  STM  CHW
0  2021-01-01  BLDG_A  1000  100   10
1  2021-01-01  BLDG_B  1100  110   15
2  2021-01-02  BLDG_A  2000  200   20
3  2021-01-02  BLDG_B  2100  210   25
4  2021-01-03  BLDG_A  3000  300   30
5  2021-01-03  BLDG_B  3100  310   35
6  2021-01-04  BLDG_A  4000  400   40
7  2021-01-04  BLDG_B  4100  410   45
8  2021-01-05  BLDG_A  5000  500   50
9  2021-01-05  BLDG_B  5100  510    5

The .value determines which parts of the columns remain as header, while the names_sep determines how the columns are split up. For more complex column separation, you can use a regex with names_pattern.

  • Related