Fid sample size based on num of rows in data-CodePudding

I have a dataset that looks like this:

Region	Name
Region 1	Name 14
Region 2	Name 18
Region 2	Name 2
Region 2	Name 21
Region 2	Name 44
Region 3	Name 64
Region 3	Name 24
Region 4	Name 1
Region 4	Name 1
Region 4	Name 98
Region 5	Name 98
Region 5	Name 8
Region 5	Name 8
Region 5	Name 8
Region 5	Name 98

I need to breakup the data by Region, and then select a random sample of only 5% of the "Name" per Region, based on the number of rows in Region.

So lets say there are 30 Name in Region 2, then i need a random sample of 3*.05. If there are 50 Name in Region 6, then i need a random sample of 5*.05.

So far, ive been able to split() the data using

d = split(data, f = data$Region)

but when i try to run an lapply function i get an error that there are different number of rows in the list that split() provided

lapply(data, function(x) {
 sample_n(data, nrow(d)*.05)
} )

Any thoughts?

Thank you

CodePudding user response：

Here's a base R solution.

lapply(split(data, data$Region),
       \(x) x[sample(nrow(x), nrow(x) * 0.05),])

You can then convert it back into a data frame with rbind