Home > Blockchain >  Randomly Select 10 percent of data from the whole data set in R
Randomly Select 10 percent of data from the whole data set in R

Time:11-30

For my project, I have taken a data set which have 1296765 observations of 23 columns, I want to take just 10% of this data randomly. How can I do that in R.

I tried the below code but it only sampled out just 10 rows. But, I wanted to select randomly 10% of the data. I am a beginner so please help.

library(dplyr)  
x <- sample_n(train, 10)

CodePudding user response:

Here is a function from dplyr that select rows at random by a specific proportion:

dplyr::slice_sample(train,prop = .1) 

CodePudding user response:

In base R, you can subset by sampling a proportion of nrow():

set.seed(13)

train <- data.frame(id = 1:101, x = rnorm(101))

train[sample(nrow(train), nrow(train) / 10), ]
     id           x
69   69  1.14382456
101 101 -0.36917269
60   60  0.69967564
58   58  0.82651036
59   59  1.48369123
72   72 -0.06144699
12   12  0.46187091
89   89  1.60212039
8     8  0.23667967
49   49  0.27714729
  • Related