I have two pandas dataframes representing the edge lists of a graph. One is a dataframe called edge_list
, which contains all the edges in the graph and their corresponding weights (it's a weighted graph). The first few lines look like this:
-------- -------- -------
| source | target | value |
-------- -------- -------
| 31 | 25 | 17 |
| 45 | 21 | 67 |
| 68 | 127 | 10 |
| ... | ... | ... |
-------- -------- -------
The second dataframe, called edge_list_small
, has been created from edge_list
by applying a filter, thus it only contains a subset of all edges. It also does not contain the value
column (edge weights).
Now I want the weights of those edges in my small dataframe. Thus I figured I would merge the two based on the two columns, source and target. I am using:
complete_edge_list = small_edge_list.merge(edge_list,on=['source','target'])
There is a problem however.
print(len(edge_list), len(alpha_edge_list), len(complete_edge_list))
returns:
81,229,074 7,763,018 1,841,776
(commas added by me for readability). What am I doing wrong?
CodePudding user response:
If you try a left merge:
complete_edge_list = small_edge_list.merge(edge_list, on=['source','target'],
how='left')
and get the expected size, this means that the small dataframe is not a subset of the large one. You have missing matches and should have NaNs in the value column.