everyone. I have a dataset of 1000 rows (nodes and links) with two columns V1 and V2 in txt, which I imported with read.table. There are rows in the dataset that are reversed, e.g:
net <- read.table("DD242.txt", quote="\"", comment.char="")
V1 V2
4 5
5 4
6 7
7 8
and so on...
but I do not know which values repeat. How do I find these repeating rows and delete one of them? In this case i want to remove the second row inverted= 5 4. So that I only have:
V1 V2
4 5
6 7
7 8
Thanks a lot!
CodePudding user response:
You can filter
by lag
:
library(dplyr)
df %>%
filter(!(V1 == lag(V2, default = 0) & V2 == lag(V1, default = 0)))
# V1 V2
#1 4 5
#2 6 7
#3 7 8
Or in base R:
as.data.frame((df <- t(apply(df, 1, sort)))[!duplicated(df), ])
V1 V2
1 4 5
2 6 7
3 7 8
CodePudding user response:
data.table solution
library(data.table)
setDT(net) # or use fread instead of read.table to get a data.table right away
net <- net[, sorted := apply(.SD, 1, function(x) list(unname(sort(x))))][!duplicated(sorted)]
net[, sorted := NULL]
results
net
V1 V2
1: 4 5
2: 6 7
3: 7 8
data
net <- data.frame(
V1 = c(4, 5, 6, 7),
V2 = c(5, 4, 7, 8)
)