Home > Mobile >  Create a master data set comprised of multiple data frames
Create a master data set comprised of multiple data frames

Time:08-16

I have been stuck on this problem for a while now! Included below is a very simplified version of my program, along with some context. Essentially I want to view is one large dataframe which has all of my desired permutations based on my input variables. This is in the context of scenario analysis and it will help me avoid doing on-demand calculations through my BI tool when the user wants to change variables to visualise the output.

I have tried:

  1. Creating a function out of my code and trying to apply the function with each of the step size changes of my input variables ( no idea what I am doing there).
  2. Literally manually changing the input variables myself (as a noob I realise this is not the way to go but had to first see my code was working to append df's).

Essentially what I want to achieve is as follows:

  1. use the variables "date_offset" and "cost" and vary each of them by the required number of defined steps sizes and number of steps

  2. As an example, if there are 2 values for date_offset (step size 1) and two values for cost (step size one) there are a possible 4 combinations, therefore the data set will be 4 times the size of the df in my code below.

  3. Now I have all of the permutations of the input variable and the corresponding data frame to go with each of those permutations, I would like to append each one of the data frames together.

  4. I should be left with one data frame for all of the possible scenarios which I can then visualise with a BI tool.

I hope you guys can help :)

Here is my code.....

import pandas as pd
import numpy as np

#want to iterate through starting at a date_offset of 0 with a total of 5 steps and a step size of 1
date_offset = 0
steps_1 = 5
stepsize_1 = 1

#want to iterate though starting at a cost of 5 with a total number  of steps of 5 and a step size of 1
cost = 5
steps_2 = 4
step_size = 1

df = {'id':['1a', '2a', '3a', '4a'],'run_life':[10,20,30,40]}
df = pd.DataFrame(df)

df['date_offset'] = date_offset
df['cost'] = cost
df['calc_col1'] = df['run_life']*cost 

CodePudding user response:

Are you trying to do something like this:

from itertools import product

data = {'id': ['1a', '2a', '3a', '4a'], 'run_life': [10, 20, 30, 40]}
df = pd.DataFrame(data)

date_offset = 0
steps_1 = 5
stepsize_1 = 1

cost = 5
steps_2 = 4
stepsize_2 = 1

df2 = pd.DataFrame(
    product(
        range(date_offset, date_offset   steps_1 * stepsize_1   1, stepsize_1),
        range(cost, cost   steps_2 * stepsize_2   1, stepsize_2)
    ),
    columns=['offset', 'cost']
)
result = df.merge(df2, how='cross')
result['calc_col1'] = result['run_life'] * result['cost']

Output:

     id  run_life  offset  cost  calc_col1
0    1a        10       0     5         50
1    1a        10       0     6         60
2    1a        10       0     7         70
3    1a        10       0     8         80
4    1a        10       0     9         90
..   ..       ...     ...   ...        ...
115  4a        40       5     5        200
116  4a        40       5     6        240
117  4a        40       5     7        280
118  4a        40       5     8        320
119  4a        40       5     9        360

[120 rows x 5 columns]
  • Related