Home > Back-end >  Pandas: Combine DataFrame with a vector - pairwise rows
Pandas: Combine DataFrame with a vector - pairwise rows

Time:07-02

My first question here! I have two DataFrames:

df1 = pd.DataFrame({"A":[1,0,1,2,1],"B":[2,2,1,0,1],"C":[1,1,1,2,1],"D":[2,1,2,1,1]})
df1

   A  B  C  D
0  1  2  1  2
1  0  2  1  1
2  1  1  1  2
3  2  0  2  1
4  1  1  1  1
df2 = pd.DataFrame({"A":[1],"B":[2],"D":[4]})

   A  B  D
0  1  2  4

I need to combine them into something like this:

        A       B         C       D
0  [1, 1]  [2, 2]  [1, nan]  [2, 4]
1  [0, 1]  [2, 2]  [1, nan]  [1, 4]
2  [1, 1]  [1, 2]  [1, nan]  [2, 4]
3  [2, 1]  [0, 2]  [2, nan]  [1, 4]
4  [1, 1]  [1, 2]  [1, nan]  [1, 4]

It would be better if I could get something like this:

                            A                           B                              C                           D  
0  {'Answer': 1, 'Weight': 1}  {'Answer': 2, 'Weight': 2}   {'Answer': 1, 'Weight': nan}  {'Answer': 2, 'Weight': 4}  
1  {'Answer': 0, 'Weight': 1}  {'Answer': 2, 'Weight': 2}   {'Answer': 1, 'Weight': nan}  {'Answer': 1, 'Weight': 4}  
2  {'Answer': 1, 'Weight': 1}  {'Answer': 1, 'Weight': 2}   {'Answer': 1, 'Weight': nan}  {'Answer': 2, 'Weight': 4}  
3  {'Answer': 2, 'Weight': 1}  {'Answer': 0, 'Weight': 2}   {'Answer': 2, 'Weight': nan}  {'Answer': 1, 'Weight': 4}  
4  {'Answer': 1, 'Weight': 1}  {'Answer': 1, 'Weight': 2}   {'Answer': 1, 'Weight': nan}  {'Answer': 1, 'Weight': 4}

Does anyone know how I can get this result?

CodePudding user response:

Try this:

df1 = pd.DataFrame({"A":[1,0,1,2,1],"B":[2,2,1,0,1],"C":[1,1,1,2,1],"D":[2,1,2,1,1], 'E': [1,1,2,2,2]})
df2 = pd.DataFrame({"A":[1],"B":[2],"D":[4]})

col = list(set(df1.columns) - set(df2.columns))
df2[col] = np.NaN

df = pd.merge(df1, df2, how='cross')

namList = list(set(df.columns.str.extract('(.*)_')[0]))

for val in namList:
    xVal = val   '_x'
    yVal = val   '_y'
    
    df[val] = df.apply(lambda x: '[' str(x[xVal])   ','   str(x[yVal]) ']', axis = 'columns')
    

df = df[df1.columns]
print(df)


   A      B          C      D        E
0  [1,1]  [2,2]  [1.0,nan]  [2,4]  [1,nan]
1  [0,1]  [2,2]  [1.0,nan]  [1,4]  [1,nan]
2  [1,1]  [1,2]  [1.0,nan]  [2,4]  [2,nan]
3  [2,1]  [0,2]  [2.0,nan]  [1,4]  [2,nan]
4  [1,1]  [1,2]  [1.0,nan]  [1,4]  [2,nan]
  • Related