How do I get the cartesian product of a dictionary in a specific order?-CodePudding

I want to create a DF of the possible combinations from a full factorial design of experiments. I'm using doepy but it seems to be taking a slowing down as the number of combinations in my DOE grows. I switched to taking the Cartesian product with product_dict() which seems to faster but gives the combinations in a different order.

I need the dataframe of DOE combinations to be in the same order given by doepy, I think thats possible but I'm unsure how.

My question is how to compute the Cartesian product of gas_dict and give the results in the same order as doepy?

import pandas as pd
from doepy import build
from tqdm.contrib.itertools import product

gas_dict = {
 'Velocity (m/s)': [0.00000000E 00, 0.10000000E 00, 0.20000000E 00, 0.30000000E 00, 
         0.40000000E 00, 0.60000000E 00, 0.10000000E 01], 
 
 'Pressure (Pa)': [0.10000000E 06, 0.50000000E 06, 0.10000000E 07, 0.20000000E 07, 
                   0.40000000E 07], 
 
 'Temperature': [0.30000000E 03, 0.40000000E 03, 0.50000000E 03, 0.60000000E 03,],
 'Equivalence Ratio': [0.10000000E 00, 0.50000000E 00, 0.60000000E 00, 0.70000000E 00, 
                      0.80000000E 00, 0.90000000E 00, 0.10000000E 01, 0.11000000E 01, 
                      0.12000000E 01, 0.13000000E 01]    }


def product_dict(**kwargs):
    keys = kwargs.keys()
    vals = kwargs.values()
    for instance in product(*vals):
        yield dict(zip(keys, instance))
        
gas = build.full_fact(gas_dict)      #Correct form 
gas_product = list(product_dict(**gas_dict))  #Incorrect
gas_product = pd.DataFrame(gas_product)

gas_.equals(gas_product)
compare = gas == gas_product

CodePudding user response：

I think this comes down to the order of arguments to the product function, which seems to determine the way that the arrays are cycled through when creating the cartesian product. If you reverse the order of arrays in the input dictionary the outputs become the same. (This just seems to be the case for the difference between the two implementations in action here, so would be nice if someone had some more detailed insights.)

# reverse dictionary keys
reversed_cols = list(gas_dict.keys())[::-1]
# create reversed input dictionary
gas_dict_rev = {c: sorted(gas_dict[c]) for c in reversed_cols}
# create cartesian product with reversed column order
gas_product_rev = list(product_dict(**gas_dict_rev)) 
gas_product_rev = pd.DataFrame(gas_product_rev)
# change the column order to conform original dict
gas_product_rev = gas_product_rev[gas_dict.keys()]

the output then looks the same visually, but gas.equals(gas_product_rev) still reports FALSE for me. I'm not familiar with this function but I'd guess it does not take float precision into account. Checking with a numpy function that allows for float precision we get the expected result:

for column in gas:
    print(f'{column} {np.allclose(gas[column], gas_product[column])}')

# Velocity (m/s) False
# Pressure (Pa) False
# Temperature False
# Equivalence Ratio False

for column in gas:
    print(f'{column} {np.allclose(gas[column], gas_product_rev[column])}')

# Velocity (m/s) True
# Pressure (Pa) True
# Temperature True
# Equivalence Ratio True