I have a bunch of Dataframes following this kind of pattern:
col1 col2 col3
1 2 3
1 2 3
1 2 3
col1 col2 col3
1 2 3
1 2 3
1 2 3
how do I merge them into
col1 col2 col3
[1,1] [2,2] [3,3]
[1,1] [2,2] [3,3]
[1,1] [2,2] [3,3]
I have no idea how to do this, just feels like there should be an easy way.
CodePudding user response:
If your dataframe are well aligned, you can use numpy.dstack
import numpy as np
out = pd.DataFrame(np.dstack([df1, df2]).tolist(),
index=df1.index, columns=df1.columns)
print(out)
# Output
col1 col2 col3
0 [1, 1] [2, 2] [3, 3]
1 [1, 1] [2, 2] [3, 3]
2 [1, 1] [2, 2] [3, 3]
Update
Using only pandas
:
out = pd.concat([df1, df2]).stack().groupby(level=[0, 1]) \
.apply(list).unstack(level=1)
print(out)
# Output
col1 col2 col3
0 [1, 1] [2, 2] [3, 3]
1 [1, 1] [2, 2] [3, 3]
2 [1, 1] [2, 2] [3, 3]
CodePudding user response:
Try this
import pandas as pd
df1 = pd.DataFrame([[10, 20, 30], [10, 20, 30], [10, 20, 30]])
df2 = pd.DataFrame([[11, 12, 13], [11, 12, 13], [11, 12, 13]])
df1.applymap(lambda x: [x]) df2.applymap(lambda x: [x])
→
0 1 2
0 [10, 11] [20, 12] [30, 13]
1 [10, 11] [20, 12] [30, 13]
2 [10, 11] [20, 12] [30, 13]
Explanation:
lambda x: [x]
is a function which converts every argument x
in a list of length 1 containing exactly that argument.
.applymap
applies this function to every cell in the data frame.
(the sum operator) is "overloaded" for pandas data frames. In particular, the sum f1 f2
of two frames (of equal shape) is defined as a new frame containing in each cell the sum of the corresponding cells of the operands (f1
and f2
).
This is trivial if the cells contains numbers. But this also works for other data types: In Python lists can be concatenated via the sum operator: [1, 2] [50, 60]
→ [1, 2, 50, 60]
.