I have a data frame df
with 500 rows and 6 columns
With s <- sample_n(df, 100)
I get 100 random rows of it. I then want to sample 100 rows from the remaining 400. How can I modify my initial data frame that the 100 rows I selected are removed?
I've read on similar questions df[-s]
but at least in this case that doesnt work.
CodePudding user response:
Here's a solution with anti_join
:
Data:
set.seed(12)
df <- data.frame(
x = rnorm(100)
)
Procedure:
library(dplyr)
df %>%
# take sample:
sample_n(10) %>%
# subtract sample from dataframe:
anti_join(df, .)
Joining, by = "x"
x
1 -1.480567595
2 1.577169472
3 -0.956744479
4 -1.997642097
5 -0.272296044
6 -0.315348711
7 -0.628255237
8 -0.106463885
9 0.428014802
10 -1.293882298
11 -0.779566508
12 0.011951759
13 -0.703464254
14 0.340512271
15 0.506968172
16 -0.293305149
17 0.223641415
18 2.007201457
19 1.011979118
20 -0.302459245
21 -1.025244839
22 -0.267384830
23 -0.199105661
24 0.131122595
25 0.145799896
26 0.362064721
27 0.673981164
28 2.072035768
29 -0.541028649
30 -1.070492158
31 -0.372456732
32 -0.485141355
33 0.274784178
34 -0.479512562
35 0.798105326
36 -1.004451202
37 0.578134627
38 -1.595625656
39 -0.308503656
40 0.449465922
41 -0.977053283
42 0.189997859
43 0.731453357
44 -0.492599111
45 -0.042684912
46 -0.112670576
47 0.456827248
48 2.020334842
49 -1.050890062
50 0.734652106
51 0.539249744
52 -1.314272797
53 -0.250038722
54 0.314204596
55 0.406546694
56 0.994420600
57 0.855768432
58 0.197128917
59 0.834325038
60 0.846790152
61 1.954105255
62 -2.149260002
63 0.971120270
64 1.145061573
65 -0.525400626
66 0.250320103
67 -0.429406611
68 -0.182519622
69 -0.103310466
70 -0.633838203
71 -1.271053787
72 -0.383950394
73 0.516755802
74 -0.177968544
75 0.004258039
76 -1.274059551
77 -0.202110338
78 1.164465880
79 -0.023379409
80 -0.176724455
81 1.113709078
82 -0.541888860
83 -0.963398332
84 0.376448400
85 0.129262533
86 -0.342289274
87 0.452281257
88 -0.694737942
89 -0.239013591
90 -1.007298960