Home > Software design >  Combine and expand Dataframe based on IDs in columns
Combine and expand Dataframe based on IDs in columns

Time:10-07

I have 3 dataframes A,B,C:

import pandas as pd

A = pd.DataFrame({"id": [1,2], 
                "connected_to_B_id1":["A","B"],
                "connected_to_B_id2":["B","C"],
                "connected_to_B_id3":["C", np.nan],
                # entry can have multiple ids from B
                })

B = pd.DataFrame({"id": ["A","B","C"], 
                "connected_to_C_id1":[1,1,2],
                "connected_to_C_id2":[2,2,np.nan],
                # entry can have multiple ids from C
                })
C = pd.DataFrame({"id": [1,2], 
                "name":["a","b"],
                })


#Output should be D:
D = pd.DataFrame({"id_A": [1,1,1,1,1,2,2,2],
                  "id_B": ["A","A","B","B","C","B","B","C"],
                  "id_C": [1,2,1,2,2,1,2,1],
                  "name": ["a","b","a","b","b","a","b","a"]
                })

I want to use the IDs stored in the "connected_to_X" columns of each dataframe to create a dataframe, which contains all relationships recorded in the three individual dataframes.

What is the most elegant way to combine the dataframes to A, B and C to D? Currently I am using dicts,lists and for loops and its messy and complicated.

D:

|idx |id_A|id_B|id_C|name|
|---:|--:|--:|--:|--:|
|  0 | 1 | A | 1 | a |
|  1 | 1 | A | 2 | b |
|  2 | 1 | B | 1 | a |
|  3 | 1 | B | 2 | b |
|  4 | 1 | C | 2 | b |
|  5 | 2 | B | 1 | a |
|  6 | 2 | B | 2 | b |
|  7 | 2 | C | 1 | a |

CodePudding user response:

You just need to unpivot A and B then you can join the tables up.

(A.
 melt(id_vars='id').
 merge(B.melt(id_vars='id'), left_on = 'value', right_on='id', how='left').
 merge(C, left_on = 'value_y', right_on='id').
 drop(columns = ['variable_x', 'variable_y', 'value_x']).
 sort_values(['id_x', 'id_y']).
 reset_index(drop=True).
 reset_index()
 )

   index  id_x id_y  value_y  id name
0      0     1    A      1.0   1    a
1      1     1    A      2.0   2    b
2      2     1    B      1.0   1    a
3      3     1    B      2.0   2    b
4      4     1    C      2.0   2    b
5      5     2    B      1.0   1    a
6      6     2    B      2.0   2    b
7      7     2    C      2.0   2    b
  • Related