The array of data will come from other source, so in this example I have declared them as array and with minimal data. But the combinations of the entries in lists will be a lot more over million combinations. It's like the length of ar1 * arr2 * arr3 etc.
arr1 = [1, 2]
arr2 = [10, 20]
arr3 = [0.1, 0.2]
df1 = pd.DataFrame(arr1, columns=["col1"])
df2 = pd.DataFrame(arr2, columns=["col2"])
df3 = pd.DataFrame(arr3, columns=["col3"])
Final result of the new DataFrame should be all of posible combinations of the given arrays:
col1 col2 col3
1 10 0.1
1 10 0.2
1 20 0.1
1 20 0.2
2 10 0.1
2 10 0.2
2 20 0.1
2 20 0.2
CodePudding user response:
https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.from_product.html effectively does what you want. Simple to then change to a dataframe
arr1 = [1, 2]
arr2 = [10, 20]
arr3 = [0.1, 0.2]
pd.DataFrame(index=pd.MultiIndex.from_product([arr1, arr2, arr3], names=["col1","col2","col3"])).reset_index()
CodePudding user response:
functools.reduce
and pd.merge
Kind of slow.
import pandas as pd
from functools import reduce
reduce(
pd.merge,
[d.assign(dummy=1)
for d in [df1, df2, df3]
]).drop('dummy', axis=1)
col1 col2 col3
0 1 10 0.1
1 1 10 0.2
2 1 20 0.1
3 1 20 0.2
4 2 10 0.1
5 2 10 0.2
6 2 20 0.1
7 2 20 0.2
itertools.product
and pd.DataFrame.itertuples
Definitely faster
import pandas as pd
from itertools import product
def tupify(d): return d.itertuples(index=False, name=None)
def sumtup(t): return sum(t, start=())
pd.DataFrame(
list(map(sumtup, product(*map(tupify, [df1, df2, df3])))),
columns = sum(map(list, [df1, df2, df3]), start=[])
)
col1 col2 col3
0 1 10 0.1
1 1 10 0.2
2 1 20 0.1
3 1 20 0.2
4 2 10 0.1
5 2 10 0.2
6 2 20 0.1
7 2 20 0.2