I want the indices of the unselected rows when using sample()
in R. Consider the following case.
df <- data.frame(id = c(1,1,2,2,3,3),
v1 = c(2,2,9,4,7,1),
v2 = c(3,5,8,5,8,5))
ss <- ceiling(0.5*nrow(df)) #size
set.seed(123)
rid <- sample(seq_len(nrow(df)),size=ss,replace=F)
Now, the rows 3,6,2
are randomly selected. Is there a way to know indices of unselected rows (1,4,5
)?
Thanks!
CodePudding user response:
You can use df[-rid,]
:
df <- data.frame(
id = c(1, 1, 2, 2, 3, 3),
v1 = c(2, 2, 9, 4, 7, 1),
v2 = c(3, 5, 8, 5, 8, 5)
)
ss <- ceiling(0.5 * nrow(df)) # size
set.seed(123)
rid <- sample(seq_len(nrow(df)), size = ss, replace = F)
rid
#> [1] 3 6 2
df
#> id v1 v2
#> 1 1 2 3
#> 2 1 2 5
#> 3 2 9 8
#> 4 2 4 5
#> 5 3 7 8
#> 6 3 1 5
df[rid,]
#> id v1 v2
#> 3 2 9 8
#> 6 3 1 5
#> 2 1 2 5
df[-rid, ]
#> id v1 v2
#> 1 1 2 3
#> 4 2 4 5
#> 5 3 7 8
rownames(df[-rid, ])
#> [1] "1" "4" "5"
Created on 2021-11-05 by the reprex package (v2.0.1)