Home > Blockchain >  Concat two dataframe such that the new dataframe value appear as {cell1:cell2}
Concat two dataframe such that the new dataframe value appear as {cell1:cell2}

Time:07-29

I have a dataframe say

df1 = pd.DataFrame({"col1": [1, 2, 3],
                  "col2": [4, 5, 6],
                  "col3":[7, 8, 9]})

and another dataframe say df2

df2 = pd.DataFrame({"colA": [10, 20, 30],
                  "colB": [40, 50, 60],
                  "colC":[70, 80, 90]})

I need to combine these two dataframe such that they form a dictionary like

col1     col2    col3
{1:10}  {2:20}  {3:30}
{4:40}  {5:50}  {6:60}
{7:70}  {8:80}  {9:90}

Now obviously this can be done using loops and apply function but I was looking for a more one liner or vector approach to do this because I have very large number of columns and rows and thus this needs to be optimized.

CodePudding user response:

It's ugly, but I believe it works for what you need.

The premise is to make sure all the column names are identical between the two dataframes, then stack them on top of each other using concat. Then group by the index and convert each pair of values to a dict. We are using a dict comprehension so it'll work for any number of columns.

This will end up inverting your df, so we can transpose, reset index, and then add prefix.

import pandas as pd
df1 = pd.DataFrame({"col1": [1, 2, 3],
                  "col2": [4, 5, 6],
                  "col3":[7, 8, 9]})

df2 = pd.DataFrame({"colA": [10, 20, 30],
                  "colB": [40, 50, 60],
                  "colC":[70, 80, 90]})

df2.columns = df1.columns

(
    pd.concat([df1,df2], axis=0)
    .groupby(level=0)
    .agg({col:lambda x:dict([list(x)]) for col in df1})
    .T
    .reset_index(drop=True)
    .add_prefix('col')
)

Output

      col0     col1     col2
0  {1: 10}  {2: 20}  {3: 30}
1  {4: 40}  {5: 50}  {6: 60}
2  {7: 70}  {8: 80}  {9: 90}

CodePudding user response:

Here is a solution that is similar to Chris'

(pd.concat([df1,df2.set_axis(df1.columns,axis=1)])
 .groupby(level=0)
 .agg(lambda x: dict([tuple(x)])))

Output:

      col1     col2     col3
0  {1: 10}  {4: 40}  {7: 70}
1  {2: 20}  {5: 50}  {8: 80}
2  {3: 30}  {6: 60}  {9: 90}

CodePudding user response:

  • Convert them to single value tuples.
  • Add the tuples together.
  • Convert the tuples to dictionaries.
df1 = ((df1.applymap(lambda x: (x,))  
  df2.applymap(lambda x: (x,)).values) # have to use .values here so it doesn't try to match columns.
   .applymap(lambda x: dict((x,))))
print(df1)

Output:

      col1     col2     col3
0  {1: 10}  {4: 40}  {7: 70}
1  {2: 20}  {5: 50}  {8: 80}
2  {3: 30}  {6: 60}  {9: 90}

num_cols = len(df1.columns)
df = df1.join(df2)
for i in range(num_cols):
    df1.iloc[:, i] = df.iloc[:, i::num_cols].apply(lambda x: {x[0]: x[1]}, axis=1)

print(df1)
# Same Output as above
  • Related