Home > Software design >  Sample, replicate and histogram in R
Sample, replicate and histogram in R

Time:07-03

I want to choose 100 houses randomly from my dataset, and find the mean value of their total price. Then repeat this action 100 times, and for each time I repeat the action, calculate the mean price. And then plot all the mean values in a histogram. This is my code (rome is the house dataset):

run <- rome[sample(1:nrow(rome), 100, replace=FALSE),]
dun <- mean(run$PRICE)
c <- replicate(100, dun)

I also tried the for loop, which I'm pretty sure I need to use here, but there are mistakes in my code:

   d <- for(i in 1:100){
   run <- rome[sample(1:nrow(rome), 100, replace=FALSE),]
   dun <- mean(run$PRICE)
   c <- replicate(100, dun)
        }

And finally hist(d) , which doesn't run because of the mistakes. Can you help me?

The data (price values):

good_struct <-
  c(
    47,
    113,
    165,
    104.3,
    62.5,
    70,
    127.5,
    64.5,
    145,
    63.5,
    58.9,
    65,
    48,
    3.5,
    12.8,
    17.5,
    36,
    41.9,
    53.5,
    24.5,
    24.5,
    55.5,
    60,
    51,
    46,
    46,
    44,
    54.9,
    42.5,
    44,
    44.9,
    37.9,
    33,
    43.9,
    49.6,
    52,
    37.5,
    50,
    35.9,
    42.9,
    107,
    112,
    44.9,
    55,
    102,
    35.5,
    62.9,
    39,
    110,
    8,
    62,
    85.9,
    57,
    110,
    67.7,
    89.5,
    70,
    74,
    13,
    48,
    24,
    53.5,
    34.5,
    53,
    87.5,
    33.5,
    24,
    9.6,
    30,
    41,
    30,
    38.9,
    20.7,
    49.9,
    18.6,
    39,
    34,
    16,
    18.9,
    15.2,
    41.5,
    53,
    22,
    24.9,
    6.7,
    32.5,
    30,
    59,
    29.5,
    26,
    16.5,
    39,
    48.9,
    33.5,
    46,
    54,
    57.9,
    37.9,
    32,
    31,
    34,
    29,
    32.5,
    51.9,
    31,
    41.8,
    48,
    28,
    35,
    46.5,
    51.9,
    35.4,
    16,
    35,
    35,
    36.5,
    35.9,
    45,
    40,
    35,
    38,
    37,
    23,
    25.5,
    39.5,
    21.5,
    9,
    67.5,
    13.4,
    12.5,
    28.5,
    23,
    33.5,
    9,
    11,
    30.9,
    31.65,
    33,
    33.4,
    47,
    40,
    46,
    45.5,
    57,
    29.9,
    30,
    34,
    51,
    64.5,
    57.5,
    85.5,
    61,
    38,
    56.5,
    60.4,
    51.5,
    54,
    69,
    56,
    27.9,
    37.5,
    32.9,
    22,
    29.9,
    39.9,
    32.6,
    38.5,
    21.5,
    25.9,
    27.5,
    22.9,
    31.5,
    8.5,
    5.5,
    33,
    57,
    47,
    43.5,
    43.9,
    68.5,
    44.25,
    61,
    40,
    44.5,
    57,
    35,
    35.1,
    64.5,
    40,
    42.6,
    50,
    58,
    58,
    55,
    43,
    54,
    39,
    45,
    42,
    38.9,
    43.215,
    26.5,
    30,
    29.5
  )

CodePudding user response:

Perhaps something like this?

rome <- data.frame(PRICE = rnorm(1e6,3e5,5e4),
                   ID = 1:1e6)

dun = NULL

for(i in 1:100){
  run <- rome[sample(1:nrow(rome), 100, replace=FALSE),]
  dun <- c(dun, mean(run$PRICE))
}

hist(dun)

CodePudding user response:

Since replicate is a wrapper to sapply, consider adjusting the call by passing in an expression that subsets a vector then calls mean:

random_mean_prices <- replicate(
  100, mean(rome$PRICE[sample(1:nrow(rome), 100, replace=FALSE)])
)

hist(random_mean_prices)
  • Related