I have a serious issue with finding all possible paths in my csv file that looks like this :

Source	Target	Source_repo	Target_repo
SOURCE1	Target2	repo-1	repo-2
SOURCE5	Target3	repo-5	repo-3
SOURCE8	Target5	repo-8	repo-5

There a large amount of lines in the datasets, more than 5000 lines. I want to generate all possible paths like this in and return a list (Target5 is equal to SOURCE5):

SOURCE1 Target2
SOURCE8 Target5 Target3

I want to implement this solution without using recursive functions, since causes problems (maximum recursion depth exceeded).

This is the current code example :

def attach_co_changing_components(base_component):
    co_changes = df_depends_on.loc[df_depends_on["Source_repo"] ==
                                   base_component, "Target_repo"].values
    result = {base_component: list(co_changes)}
    return result


def dfs(data, path, paths):
    datum = path[-1]
    if datum in data:
        for val in data[datum]:
            new_path = path   [val]
            paths = dfs(data, new_path, paths)
    else:
        paths  = [path]
    return paths



def enumerate_paths(graph, nodes=[]):
    nodes = graph.keys()
    all_paths = []
    for node in nodes:
        node_paths = dfs(graph, [node], [])
        all_paths  = node_paths
    return all_paths


if __name__ == "__main__":

    df = pd.read_csv("clean_openstack_evolution.csv")

    co_changing_components = df[["Source"]].copy()

    co_changing_components = co_changing_components.drop_duplicates(
    ).reset_index(drop=True)

    co_changing_components = co_changing_components["Source"].map(
        attach_co_changing_components)

    co_changing_components = co_changing_components.rename("Path")

    co_changing_components = co_changing_components.reset_index(drop=True)

    newdict = {}
    for k, v in [(key, d[key]) for d in co_changing_components for key in d]:
        if k not in newdict: newdict[k] = v
        else: newdict[k].append(v)

    graph_keys = df_depends_on["Source_repo"].drop_duplicates().to_dict(
    ).values()
    graph_keys = {*graph_keys}
    graph_keys = set([
        k for k in graph_keys
        if len(df_depends_on[df_depends_on["Target"] == k]) > 0
    ])

    result = enumerate_paths(new_dict)

Here is the output after executing the preceding code :

Here is the data link Google drive

I tried to solve the problem using recursive function, but the code failed with the problem of depth exceeded. I aim to solve it without recursive functions.

CodePudding user response：

I'm not sure if you want all paths or paths specifically from node to another node. Either way this looks like a job for networkx.

Setup (`nx.from_pandas_edgelist`)

import networkx as nx
import pandas as pd


df = pd.read_csv("...")

graph = nx.from_pandas_edgelist(df, create_using=nx.DiGraph)

All paths (`nx.all_simple_paths`)

from itertools import chain, product, starmap
from functools import partial


roots = (node for node, d in graph.in_degree if d == 0)

leaves = (node for node, d in graph.out_degree if d == 0)

all_paths = partial(nx.all_simple_paths, graph)

paths = list(chain.from_iterable(starmap(all_paths, product(roots, leaves))))

From one node to another

source_node = "some_node_in_graph"
target_node = "some_other_node_in_graph"
list(nx.all_simple_paths(graph, source=source_node, target=target_node))

Setup (nx.from_pandas_edgelist)

All paths (nx.all_simple_paths)

From one node to another

Setup (`nx.from_pandas_edgelist`)

All paths (`nx.all_simple_paths`)