Home > OS >  How to merge dataframes to every row in a second dataframe?
How to merge dataframes to every row in a second dataframe?

Time:04-02

The array of data will come from other source, so in this example I have declared them as array and with minimal data. But the combinations of the entries in lists will be a lot more over million combinations. It's like the length of ar1 * arr2 * arr3 etc.

arr1 = [1, 2]
arr2 = [10, 20]
arr3 = [0.1, 0.2]

df1 = pd.DataFrame(arr1, columns=["col1"])
df2 = pd.DataFrame(arr2, columns=["col2"])
df3 = pd.DataFrame(arr3, columns=["col3"])

Final result of the new DataFrame should be all of posible combinations of the given arrays:

col1 col2 col3
1    10   0.1
1    10   0.2
1    20   0.1
1    20   0.2
2    10   0.1
2    10   0.2
2    20   0.1
2    20   0.2

CodePudding user response:

https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.from_product.html effectively does what you want. Simple to then change to a dataframe

arr1 = [1, 2]
arr2 = [10, 20]
arr3 = [0.1, 0.2]

pd.DataFrame(index=pd.MultiIndex.from_product([arr1, arr2, arr3], names=["col1","col2","col3"])).reset_index()

CodePudding user response:

functools.reduce and pd.merge

Kind of slow.

import pandas as pd
from functools import reduce

reduce(
    pd.merge,
    [d.assign(dummy=1)
     for d in [df1, df2, df3]
    ]).drop('dummy', axis=1)

   col1  col2  col3
0     1    10   0.1
1     1    10   0.2
2     1    20   0.1
3     1    20   0.2
4     2    10   0.1
5     2    10   0.2
6     2    20   0.1
7     2    20   0.2

itertools.product and pd.DataFrame.itertuples

Definitely faster

import pandas as pd
from itertools import product

def tupify(d): return d.itertuples(index=False, name=None)
def sumtup(t): return sum(t, start=())

pd.DataFrame(
    list(map(sumtup, product(*map(tupify, [df1, df2, df3])))),
    columns = sum(map(list, [df1, df2, df3]), start=[])
)

   col1  col2  col3
0     1    10   0.1
1     1    10   0.2
2     1    20   0.1
3     1    20   0.2
4     2    10   0.1
5     2    10   0.2
6     2    20   0.1
7     2    20   0.2
  • Related