I have list of 5 elements which could be 50000, now I want to sum all the combinations from the same list and create a dataframe from the results, so I am writing following code,
x =list(range(1,5))
t=[]
for i in x:
for j in x:
t.append((i,j,i j))
df=pd.Dataframe(t)
The above code is generating the correct results but taking so long to execute when I have more elements in the list. Looking for the fastest way to do the same thing
CodePudding user response:
Combinations can be obtained through the pandas.merge()
method without using explicit loops
x = np.arange(1, 5 1)
df = pd.DataFrame(x, columns=['x']).merge(pd.Series(x, name='y'), how='cross')
df['sum'] = df.x.add(df.y)
print(df)
x y sum
0 1 1 2
1 1 2 3
2 1 3 4
3 1 4 5
4 1 5 6
5 2 1 3
6 2 2 4
...
CodePudding user response:
List Comprehension can make it faster. So, you can use t=[(i,j,i j) for i in x for j in x]
instead of for loop, as the traditional for loop is slower than list comprehensions, and nested loop is even slower. Here's the updated code in replacement of nested loops.
x =list(range(1,5))
t=[(i,j,i j) for i in x for j in x]
df=pd.Dataframe(t)