Home > Mobile >  How to create a dictionary to hold dataframes in python with loops
How to create a dictionary to hold dataframes in python with loops

Time:10-23

I am saving a large amount of data from some Monte Carlo simulations. I simulate 20 things over a period of 10 time steps using a varying of random draws. So, for a given number of random draws, I have have a folder with 10 .csv files (one for each time step) which has 20 columns of data and n rows per column, where n is the number of random draws in that simulation. Currently my basic code for loading data in looks something like this:

import pandas as pd
import numpy as np

load_path = r'...\path\to\data'
numScenarios = [100, 500, 1000, 2500, 5000, 10000, 20000]
yearsSimulated = np.arange(1,11)
for n in numScenarios:
    folder_path = load_path   '\draws = '   str(n)
        for year in yearsSimulated:
            filename = '\year '   str(year)   '.csv'
            path = folder_path   filename
            df = pd.read_csv(path)
            # save df.describe() somewhere

I want to efficiently save df.describe() somehow so that I can compare how the number of random draws is affecting results for the 20 things for a given time step. That is, I would ultimately like some object that I can access easily that will store all the df.describe() outputs for each individual time step. I'm not sure of a nice way to do this though. Some previous questions seem to suggest that dictionaries may be the way to go here but I've not been able to get them going.

CodePudding user response:

Edit:

My final approach is to use an answer to a question here with a bunch of loops. So now I have:

class ngram(dict): 
    """Based on perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return super(ngram, self).__getitem__(item)
        except KeyError:
            value = self[item] = type(self)()
            return value

results = ngram()
for i, year in enumerate(years):
    year_str = str(year)
    ann_stats = pd.DataFrame()
    for j, n in enumerate(numScenarios):
        n_str = str(n)
        folder_path = load_path   '\draws = '   str(n)
        filename = '\scenarios '   str(year)   '.csv'
        path = folder_path   filename
        df = pd.read_csv(path)
        ann_stats['mean'] = df.mean()
        ann_stats['std. dev'] = df.std()
        ann_stats['1%'] = df.quantile(0.01)
        ann_stats['25%'] = df.quantile(0.25)
        ann_stats['50%'] = df.quantile(0.5)
        ann_stats['75%'] = df.quantile(0.75)
        ann_stats['99%'] = df.quantile(0.99)
        results[year_str][n_str] = ann_stats.T

And so now the summary data for each time step and number of draws is accessed as a dataframe with

test = results[year_str][n_str]

where the columns of test hold results for each of my 20 things.

  • Related