Home > Mobile >  Fid sample size based on num of rows in data
Fid sample size based on num of rows in data

Time:05-11

I have a dataset that looks like this:

Region Name
Region 1 Name 14
Region 2 Name 18
Region 2 Name 2
Region 2 Name 21
Region 2 Name 44
Region 3 Name 64
Region 3 Name 24
Region 4 Name 1
Region 4 Name 1
Region 4 Name 98
Region 5 Name 98
Region 5 Name 8
Region 5 Name 8
Region 5 Name 8
Region 5 Name 98

I need to breakup the data by Region, and then select a random sample of only 5% of the "Name" per Region, based on the number of rows in Region.

So lets say there are 30 Name in Region 2, then i need a random sample of 3*.05. If there are 50 Name in Region 6, then i need a random sample of 5*.05.

So far, ive been able to split() the data using

d = split(data, f = data$Region)

but when i try to run an lapply function i get an error that there are different number of rows in the list that split() provided

lapply(data, function(x) {
 sample_n(data, nrow(d)*.05)
} ) 

Any thoughts?

Thank you

CodePudding user response:

Here's a base R solution.

lapply(split(data, data$Region),
       \(x) x[sample(nrow(x), nrow(x) * 0.05),])

You can then convert it back into a data frame with rbind

  • Related