Splitting Dataframe into more Dataframes-CodePudding

I have a dataframe (around 50k rows and 150 columns) with energy and weather variables data from different cities.

I would like to split the dataframe into 5 dataframes (a dataframe for each city).

The whole dataframe is basically structured like this

df = pd.DataFrame({'Weather':[4,5,4,5,5,4],
                   'Energy':[7,8,9,4,2,3],
                   'Weather_city1':[1,3,5,7,1,0],
                   'Energy_city1':[7,4,7,2,1,0],
                   'Weather_city2':[1,0,6,2,6,9],
                   'Energy_city2':[6,1,5,3,2,7]}
                  )

print (df)
  Weather  Energy  Weather_city1  Energy_city1 ...
0   4        4           7             1
1   5        5           8             3
2   4        4           9             5
3   5        5           4             7
4   5        5           2             1
5   4        4           3             0

How do I split it into more dataframes (one for each city with values just for city1, one for city2 and so on)?

CodePudding user response：

IIUC, you could use:

# columns without city id
cols = ['Weather', 'Energy']

groups = df.drop(columns=cols).columns.str.extract('(?<=_)(.*)$', expand=False)

[g.reset_index() for _, g in df.set_index(cols).groupby(groups, axis=1)]

output:

[   Weather  Energy  Weather_city1  Energy_city1
 0        4       7              1             7
 1        5       8              3             4
 2        4       9              5             7
 3        5       4              7             2
 4        5       2              1             1
 5        4       3              0             0,
    Weather  Energy  Weather_city2  Energy_city2
 0        4       7              1             6
 1        5       8              0             1
 2        4       9              6             5
 3        5       4              2             3
 4        5       2              6             2
 5        4       3              9             7]

As dictionary:

{name: g.reset_index()
 for name, g in df.set_index(['Weather', 'Energy']).groupby(groups, axis=1)}

output:

{'city1':    Weather  Energy  Weather_city1  Energy_city1
 0        4       7              1             7
 1        5       8              3             4
 2        4       9              5             7
 3        5       4              7             2
 4        5       2              1             1
 5        4       3              0             0,
 'city2':    Weather  Energy  Weather_city2  Energy_city2
 0        4       7              1             6
 1        5       8              0             1
 2        4       9              6             5
 3        5       4              2             3
 4        5       2              6             2
 5        4       3              9             7}

CodePudding user response：

I would firts transform the raw data

import pandas as pd

data = {'Weather_city1':[1,3,5,7,1,0],
        'Energy_city1':[7,4,7,2,1,0],
        'Weather_city2':[1,0,6,2,6,9],
        'Energy_city2':[6,1,5,3,2,7]}

# get the list of unique city
cities = set([elem.split("_")[1] for elem in data.keys()])

import numpy as np
city_data = {}
for city in cities:
    city_data[city] = {"Weather": data[f"Weather_{city}"], "Energy": data[f"Energy_{city}"]}

city_data

{'city1': {'Weather': [1, 3, 5, 7, 1, 0], 'Energy': [7, 4, 7, 2, 1, 0]},
 'city2': {'Weather': [1, 0, 6, 2, 6, 9], 'Energy': [6, 1, 5, 3, 2, 7]}}

Then you can play with pandas

cities_dataframes = {city: pd.DataFrame(city_data[city]) for city in cities}

cities_dataframes['city1']

#   Weather Energy
#    0  1   7
#    1  3   4
#    2  5   7
#    3  7   2
#    4  1   1
#    5  0   0