I have a list and I want to interleave the elements in all combinations then distribute them to two columns of a dataframe in pandas
, like:
df = pd.DataFrame(columns = ["pair1","pair2"])
mylist = ["a", "b", "c"]
for i in mylist:
for j in mylist:
df.loc[df.shape[0]] = [i, j]
to output
pair1 pair2
0 a a
1 a b
2 a c
3 b a
4 b b
5 b c
6 c a
7 c b
8 c c
However, such an assignment is slow.
Do we have a faster method?
CodePudding user response:
For a pandas solution, you could use pd.MultiIndex
:
df[['pair1','pair2']] = pd.MultiIndex.from_product([mylist]*2).tolist()
or you could also cross-merge (if you have pandas>=1.2.0):
df = pd.merge(pd.Series(mylist, name='pair1'), pd.Series(mylist, name='pair2'), how='cross')
Output:
pair1 pair2
0 a a
1 a b
2 a c
3 b a
4 b b
5 b c
6 c a
7 c b
8 c c
CodePudding user response:
You can use itertools.product()
to generate the data ahead of time, rather than repeatedly appending to the end of the dataframe:
import pandas as pd
from itertools import product
mylist = ["a", "b", "c"]
df = pd.DataFrame(product(mylist, repeat=2), columns = ["pair1","pair2"])
print(df)
This outputs:
pair1 pair2
0 a a
1 a b
2 a c
3 b a
4 b b
5 b c
6 c a
7 c b
8 c c