I'm having data from a network with all modems, network elements and finally the technical sites and I want to create a vector with the topology for each modem.
The network has a tree structure. For my questions lets asume it looks like:
The red one is the technical site and the green ones are modems. The yellow ones are topology elements in between. This information is in datasets like:
library(data.table)
df1 = structure(list(mac = c("A90", "BCF", "12A", "D4B"), top = c(100L,
200L, 300L, 400L)), row.names = c(NA, -4L), class = c("data.table","data.frame"))
mac top
1: A90 100
2: BCF 200
3: 12A 300
4: D4B 400
df2 = structure(list(a = c(100L, 150L, 156L, 800L, 200L, 203L, 300L,
400L, 111L, 963L), b = c(150L, 156L, 800L, 263L, 203L, 800L,
263L, 111L, 963L, 156L)), row.names = c(NA, -10L), class = c("data.table","data.frame"))
> df2
a b
1: 100 150
2: 150 156
3: 156 800
4: 800 263
5: 200 203
6: 203 800
7: 300 263
8: 400 111
9: 111 963
10: 963 156
a
is representing the starting and b
the target point.
Now I want to create for each modem the topology which should looks like:
df_target = structure(list(mac = c("A90", "BCF", "12A", "D4B"), topo_complete = c("100, 150, 156, 800, 263",
"200, 203, 800, 263", "300, 263", "400, 111, 963, 156, 800, 263"
)), row.names = c(NA, -4L), class = c("data.table", "data.frame"))
mac topo_complete
1: A90 100, 150, 156, 800, 263
2: BCF 200, 203, 800, 263
3: 12A 300, 263
4: D4B 400, 111, 963, 156, 800, 263
Or in words:
For each modem (mac
) in df1
take top
value and search in df2.a
for its target (df.b
), then take this target value, store it and use it for searching if it exsists in df2.a
. If yes, take the target (df.b
) again and repeat. As long as it doesn't find any target values anymore.
There is never more than one target point, so no need for handling cases with more than one target point.
To be honest, I don't have any clue how I could solve this. And additionally, as the real world data has more than one million macs and some 100k of topology elements it has to be efficent in terms of speed/performance. Memory ussage doesn't matter. If possible I'd like to use data.table
.
Could somebody help?
CodePudding user response:
One way of doing it is using igraph
package:
library(igraph)
library(data.table)
g <- graph_from_data_frame(df2, directed = TRUE)
df1[, topo_complete := lapply(as.character(top), function(x) names(subcomponent(g, x, mode = "out")))]