I have this data frame:
data = {'DATE':['2021-01-01','2021-01-02','2021-01-03','2021-01-04','2021-01-05'],
'BLDG_A ELEC':[1000, 2000, 3000, 4000, 5000],
'BLDG_A STM':[100, 200, 300, 400, 500],
'BLDG_A CHW':[10, 20, 30, 40, 50],
'BLDG_B ELEC':[1100, 2100, 3100, 4100, 5100],
'BLDG_B STM':[110, 210, 310, 410, 510],
'BLDG_B CHW':[15, 25, 35, 45, 5]}
I need to transpose it into this format:
data2 = {'DATE':['2021-01-01','2021-01-01','2021-01-02','2021-01-02'],
'BLDG':['BLDG_A', 'BLDG_B', 'BLDG_A', 'BLDG_B'],
'ELEC': [1000,1100,2000,2100],
'STM': [100,110,200,210],
'CHW': [10,15,20,25]}
I am able to transform it manually by splitting the data frame into several data frames (ELEC, STM, CHW), renaming the values, and then merging them back again. The logic here is to apply a "case-when" condition to rename all the column names into just the building names. Here's my code so far:
import numpy as np
# create a data frame for elec, and transpose it
df_elec = df[['DATE', 'BLDG_A ELEC', 'BLDG_B ELEC']]
df_elec = df_elec.set_index('DATE').stack().reset_index(name='ELEC').rename(columns={'level_1':'BLDG'})
df_elec['BLDG'] = np.where((df_elec['BLDG'] == 'BLDG_A ELEC'),'BLDG_A',df_elec.BLDG)
df_elec['BLDG'] = np.where((df_elec['BLDG'] == 'BLDG_B ELEC'),'BLDG_B',df_elec.BLDG)
# create a data frame for stm, and transpose it
df_stm = df[['DATE', 'BLDG_A STM', 'BLDG_B STM']]
df_stm = df_stm.set_index('DATE').stack().reset_index(name='STM').rename(columns={'level_1':'BLDG'})
df_stm['BLDG'] = np.where((df_stm['BLDG'] == 'BLDG_A STM'),'BLDG_A',df_stm.BLDG)
df_stm['BLDG'] = np.where((df_stm['BLDG'] == 'BLDG_B STM'),'BLDG_B',df_stm.BLDG)
# create a data frame for chw, and transpose it
df_chw = df[['DATE', 'BLDG_A CHW', 'BLDG_B CHW']]
df_chw = df_chw.set_index('DATE').stack().reset_index(name='CHW').rename(columns={'level_1':'BLDG'})
df_chw['BLDG'] = np.where((df_chw['BLDG'] == 'BLDG_A CHW'),'BLDG_A',df_chw.BLDG)
df_chw['BLDG'] = np.where((df_chw['BLDG'] == 'BLDG_B CHW'),'BLDG_B',df_chw.BLDG)
# merge all the data frames
df_merge = pd.merge(df_elec, df_stm, how='left', on=['DATE','BLDG'])
df_merge = pd.merge(df_merge, df_chw, how='left', on=['DATE','BLDG'])
df_merge
Are there other more straightforward approaches to this? Additionally, this code is not scalable once we added more building columns. Thanks in advance, Stack Overflow!
CodePudding user response:
Here is one way to do it
# melt the dataframe keeping date as its own column 'DATE
df2=df.melt(id_vars=['DATE'])
# split the melted column name at space into two columns
df2[['BLDG', 'col']]=df2['variable'].str.split(' ', expand=True)#.drop(columns='variable')
#pivot to reformat the result into desire format
df2.pivot(index=['DATE','BLDG'], columns='col', values='value' ).reset_index().rename_axis(columns=None)
DATE BLDG CHW ELEC STM
0 2021-01-01 BLDG_A 10 1000 100
1 2021-01-01 BLDG_B 15 1100 110
2 2021-01-02 BLDG_A 20 2000 200
3 2021-01-02 BLDG_B 25 2100 210
4 2021-01-03 BLDG_A 30 3000 300
5 2021-01-03 BLDG_B 35 3100 310
6 2021-01-04 BLDG_A 40 4000 400
7 2021-01-04 BLDG_B 45 4100 410
8 2021-01-05 BLDG_A 50 5000 500
9 2021-01-05 BLDG_B 5 5100 510
CodePudding user response:
one option is with pivot_longer from pyjanitor:
# pip install pyjanitor
import janitor
import pandas as pd
(df
.pivot_longer(
index = 'DATE',
names_to = ('BLDG', '.value'),
names_sep = ' ',
sort_by_appearance=True)
)
DATE BLDG ELEC STM CHW
0 2021-01-01 BLDG_A 1000 100 10
1 2021-01-01 BLDG_B 1100 110 15
2 2021-01-02 BLDG_A 2000 200 20
3 2021-01-02 BLDG_B 2100 210 25
4 2021-01-03 BLDG_A 3000 300 30
5 2021-01-03 BLDG_B 3100 310 35
6 2021-01-04 BLDG_A 4000 400 40
7 2021-01-04 BLDG_B 4100 410 45
8 2021-01-05 BLDG_A 5000 500 50
9 2021-01-05 BLDG_B 5100 510 5
The .value
determines which parts of the columns remain as header, while the names_sep
determines how the columns are split up. For more complex column separation, you can use a regex with names_pattern
.