For my project, I have taken a data set which have 1296765 observations of 23 columns, I want to take just 10% of this data randomly. How can I do that in R.
I tried the below code but it only sampled out just 10 rows. But, I wanted to select randomly 10% of the data. I am a beginner so please help.
library(dplyr)
x <- sample_n(train, 10)
CodePudding user response:
Here is a function from dplyr
that select rows at random by a specific proportion:
dplyr::slice_sample(train,prop = .1)
CodePudding user response:
In base R, you can subset by sampling a proportion of nrow()
:
set.seed(13)
train <- data.frame(id = 1:101, x = rnorm(101))
train[sample(nrow(train), nrow(train) / 10), ]
id x
69 69 1.14382456
101 101 -0.36917269
60 60 0.69967564
58 58 0.82651036
59 59 1.48369123
72 72 -0.06144699
12 12 0.46187091
89 89 1.60212039
8 8 0.23667967
49 49 0.27714729