I have a dataframe say
df1 = pd.DataFrame({"col1": [1, 2, 3],
"col2": [4, 5, 6],
"col3":[7, 8, 9]})
and another dataframe say df2
df2 = pd.DataFrame({"colA": [10, 20, 30],
"colB": [40, 50, 60],
"colC":[70, 80, 90]})
I need to combine these two dataframe such that they form a dictionary like
col1 col2 col3
{1:10} {2:20} {3:30}
{4:40} {5:50} {6:60}
{7:70} {8:80} {9:90}
Now obviously this can be done using loops and apply function but I was looking for a more one liner or vector approach to do this because I have very large number of columns and rows and thus this needs to be optimized.
CodePudding user response:
It's ugly, but I believe it works for what you need.
The premise is to make sure all the column names are identical between the two dataframes, then stack them on top of each other using concat
. Then group by the index and convert each pair of values to a dict. We are using a dict comprehension so it'll work for any number of columns.
This will end up inverting your df, so we can transpose, reset index, and then add prefix.
import pandas as pd
df1 = pd.DataFrame({"col1": [1, 2, 3],
"col2": [4, 5, 6],
"col3":[7, 8, 9]})
df2 = pd.DataFrame({"colA": [10, 20, 30],
"colB": [40, 50, 60],
"colC":[70, 80, 90]})
df2.columns = df1.columns
(
pd.concat([df1,df2], axis=0)
.groupby(level=0)
.agg({col:lambda x:dict([list(x)]) for col in df1})
.T
.reset_index(drop=True)
.add_prefix('col')
)
Output
col0 col1 col2
0 {1: 10} {2: 20} {3: 30}
1 {4: 40} {5: 50} {6: 60}
2 {7: 70} {8: 80} {9: 90}
CodePudding user response:
Here is a solution that is similar to Chris'
(pd.concat([df1,df2.set_axis(df1.columns,axis=1)])
.groupby(level=0)
.agg(lambda x: dict([tuple(x)])))
Output:
col1 col2 col3
0 {1: 10} {4: 40} {7: 70}
1 {2: 20} {5: 50} {8: 80}
2 {3: 30} {6: 60} {9: 90}
CodePudding user response:
- Convert them to single value tuples.
- Add the tuples together.
- Convert the tuples to dictionaries.
df1 = ((df1.applymap(lambda x: (x,))
df2.applymap(lambda x: (x,)).values) # have to use .values here so it doesn't try to match columns.
.applymap(lambda x: dict((x,))))
print(df1)
Output:
col1 col2 col3
0 {1: 10} {4: 40} {7: 70}
1 {2: 20} {5: 50} {8: 80}
2 {3: 30} {6: 60} {9: 90}
num_cols = len(df1.columns)
df = df1.join(df2)
for i in range(num_cols):
df1.iloc[:, i] = df.iloc[:, i::num_cols].apply(lambda x: {x[0]: x[1]}, axis=1)
print(df1)
# Same Output as above