not familiar with R sorry for the question that I could not find already.
Suppose I have a network of IPs of data of this type:
toy_data = data.table(from=c("A","B","A","C","D","C"), to=c("B","A","C","B","A","A"))
from | to |
---|---|
A | B |
B | A |
A | C |
C | B |
D | A |
C | A |
I cannot load the whole network in igraph and trying to compute statistics based on chunks. So given that the network is undirected I would like to drop all those lines that have the opposite from-to pattern (row 2, row 6).
I originally thought that something like this would work:
unique(toy_data[,.(c(from,to)|c(to,from))])
unfortunately
I thought to use two auxiliary columns:
toy_data[,orig:=paste(from,to,sep="")]
toy_data[,reverse:=paste(to,from,sep="")]
then work with something like:
unique(df[,.(?)])
but my guess is that this is way easier than what I am doing.
CodePudding user response:
Instead of creating temporary column, paste
the min by row (pmin
) with the max
by row (pmax
) and remove the duplicates with duplicated
and negate (!
)
toy_data[!duplicated(paste(pmin(from, to), pmax(from, to)))]
-output
from to
<char> <char>
1: A B
2: A C
3: C B
4: D A