I'm doing some calculations for building boxes (yeah, boxes to put stuff in). What I do is take as input the box dimensions, wall thickness, lid thickness, and other parameters and do the math to get my materials breakdown.
I started out with a function, then took it a step further and replaced the function with dataframe math, so I could calculate several boxes at once. And then I got more and more ideas, and at the end here I am trying to calculate all possible boxes in a certain range of dimensions and material combinations.
The problem I have is I'm trying to fill my dataframe with all the necessary input values. For this I'm using nested for loops.
for width in range(50,150,5):
for length in range(50,150,5):
for height in range(50,150,5):
(append to dataframe)
And as the ranges get bigger, the dataframe gets huge. And in the end I have to spend hours waiting for the for loop to complete and get my input csv, in order to do 30 seconds of df processing and get my results (there's a few more for loops nested than the ones shown).
The question, is a nested for loop the best way to fill data in a case like this, where you have to sweep a full range and generate combinations of several variables? Or is there a more efficient way to fill the dataframe that doesn't take so long?
CodePudding user response:
I think you may be able to get away with doing a cross join on a dataframe with your dimensions.
df = pd.DataFrame([x for x in range(50,150,5)],columns='dim')
# this will create a dataframe with a single column called dim that is just the range
df.merge(df,how='cross')
# this gives the cartesian production - that is, all combinations, of the range in the
# first column and the range in the second column. df has 20 rows and after the merge
# this has 400 rows
df.merge(df,how='cross').merge(df,how='cross')
# this gives the cartesian production and then the cartesian product of it again. this results
# in 8000 rows, which is 20x20x20
Result:
dim_x dim_y dim
0 50 50 50
1 50 50 55
2 50 50 60
3 50 50 65
4 50 50 70
...
In your example, it looks like length, width, and height were all the same, but if they are different just make three different starting dataframes to merge.
CodePudding user response:
You can use itertool.product
.
import pandas as pd
import numpy as np
import itertools
df = pd.DataFrame(itertools.product(np.arange(50,150,5),
np.arange(50,150,5),
np.arange(50,150,5)),
columns = ['width', 'length', 'height']
)
print(df)
Output:
width length height
0 50 50 50
1 50 50 55
2 50 50 60
3 50 50 65
4 50 50 70
... ... ... ...
7995 145 145 125
7996 145 145 130
7997 145 145 135
7998 145 145 140
7999 145 145 145
[8000 rows x 3 columns]
Explanation:
>>> list(itertools.product(np.arange(1,3), np.arange(1,3),np.arange(1,3)))
[(1, 1, 1),
(1, 1, 2),
(1, 2, 1),
(1, 2, 2),
(2, 1, 1),
(2, 1, 2),
(2, 2, 1),
(2, 2, 2)]