Let's say I have a much bigger dataset than the one below:
df = data.frame(x = c("ciao mondo", "hello world", "ciao world","hello mondo","bye mondo","ciao ciao mondo"))
I want to sample randomly and without replacement a few rows and so I do:
sample(df$x,size = 3, replace = F)
The issue with that is that I no longer have the original row index of the sampled rows. My dataset is quite big so using anything like grepl()
to retrieve the original row indices is inefficient.
Do you have any idea on how to do it?
Thanks a lot!
CodePudding user response:
Instead of sampling on the column, do the sample
on the sequence of rows, thus it will return the row index which can be later used for subsetting the rows
i1 <- sample(seq_len(nrow(df)), size = 3, replace = FALSE)
CodePudding user response:
You could make the row number into a column, and then sample rows from that data frame.
df$row = 1:nrow(df)
df[sample(nrow(df), 3, replace = F),]
result after set.seed(0)
x row
6 ciao ciao mondo 6
1 ciao mondo 1
4 hello mondo 4