how to convert two dataframes e.g.:
df1=pd.DataFrame({'A': [1, 2,3], 'B': [10, 20,30]})
df2=pd.DataFrame({'A': [11, 22,33], 'B': [110, 220, 330]})
into
A B
0 (1, 11) (10, 110)
1 (2, 22) (20, 220)
2 (3, 33) (30, 330)
I'm trying to find a pandas function instead of using a loop. This is just a dummy example and the original dataframes have many columns
CodePudding user response:
You can use pd.join
:
df1.join(df2, lsuffix='1', rsuffix='2').apply(tuple, axis=1).to_frame('A')
CodePudding user response:
The fastest way is probably
df = pd.DataFrame({"A": zip(df1.A, df2.A)})
Much faster and simpler than the other solutions
def repeat_df(df, n):
return pd.concat([df]*n, ignore_index=True)
n = 1000
df1 = pd.DataFrame({'A': [1, 2, 3]})
df2 = pd.DataFrame({'A': [11, 22, 32]})
df1 = repeat_df(df1, n)
df2 = repeat_df(df2, n)
>>> %timeit df1.join(df2, lsuffix='1', rsuffix='2').apply(tuple, axis=1).to_frame('A')
36.6 ms ± 1.43 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit pd.concat([df1, df2]).groupby(level=0).agg(tuple)
39.8 ms ± 3.05 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit pd.DataFrame({"A": zip(df1.A, df2.A)})
1.95 ms ± 135 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
EDIT
OP updated the example to work with multiple columns. The above solution can be easily generalized
df = pd.DataFrame({col: zip(df1[col], df2[col]) for col in df1.columns})
Still much faster than the other solution. Assuming the same settings
>>> %timeit pd.concat([df1, df2]).groupby(level=0).agg(tuple)
70.3 ms ± 1.66 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit pd.DataFrame({col: zip(df1[col], df2[col]) for col in df1.columns})
3.41 ms ± 389 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
CodePudding user response:
you can concat
both then use groupby.agg
on the index. Using this method would align columns and groupby
identical index.
print(pd.concat([df1, df2]).groupby(level=0).agg(tuple))
A
0 (1, 11)
1 (2, 22)
2 (3, 32)
that said, in this specific case, maybe using a list comprehension is faster
pd.DataFrame({'A':[(a1, a2) for a1, a2 in zip(df1['A'], df2['A'])]})
CodePudding user response:
You can use pandas pd.itertuples for converting dataframe into pandas tuples after merging two dataframes.
df = pd.concat([df1, df2])
tuples = df.itertuples()