Home > Back-end >  How to merge list of dataframes all with same index and same column names?
How to merge list of dataframes all with same index and same column names?

Time:07-13

I have a list of dataframe like this of 90 heat devices

data_list = [df0,df2, ... ,df89]

All of these dataframes in data_list have the same features (= column names):

("timestamp", "outside_temperature", "heating_generation", "power_consumption", "hot_water_storage", etc..)

All dataframes have the index : timestamp for the same period of time

I want to create now new dataframes which are having one feature but for all 90 heat devices

e.g for outside_temperature:

  timestamp        device_0, device_2, device_3 ,..., device_89
01.05.2022 00:10      15.03    14.39     15.69   ...   15.30
01.05.2022 00:15      14.94    14.20     15.30   ...   15.29
01.05.2022 00:20      14.94    14.05     15.29   ...   15.20
.
.
.
etc.

and that for all my features.

Any idea whats the best way to do so ? I was thinking about merging but couldn't find good advice, or do it by a for loops.

CodePudding user response:

If I followed your question correctly you could concat the selected columns by first using a comprehension then setting the columns names in two steps. This does assume that the timestamps are the same in all data frames stored in data_list.

For example the following would concatenate all outside_temperature columns, leaving the columns name the same:

feature_df = pd.concat([x['outside_temperture'] for x in data_list], axis=1)

and then you could rename the columns with something like the following:

feature_df.columns = [f'device_{i}' for i in range(len(data_list))]

CodePudding user response:

You can concat your dataframes and then transpose them using pd.pivot function like in this small example

import pandas as pd
df1 = pd.DataFrame({
    'timestamp': [pd.Timestamp(2022, 1,1), pd.Timestamp(2022, 1,2)],
    'value':[1,2],

})
df2 = pd.DataFrame({
    'timestamp': [pd.Timestamp(2022, 1,1), pd.Timestamp(2022, 1,4)],
    'value':[3,4]
})

dfs = [df1, df2]
# add df tag
for (idx, df) in enumerate(dfs):
    df['device'] = f'device_{idx}'

final_df = pd.concat([df1, df2])
pd.pivot(index = 'timestamp' , columns = 'device', values= 'value', data= final_df )
  • Related