I am trying to draw a 1 or 0 from a bernouli distribution for each row within a list when the value in the first column exceeds 1000.
I believe my current code is drawing a distribution for each dataframe in the list as opposed to doing it for each row. Is there a way I can confirm this? for each row where distance is >1000 I want to draw from the bernouli distribution 1 or 0. each row has its own chance of being 0 or 1
mylistnew<-lapply(mylist, transform, outcome = ifelse(distance > 1000,
rbinom(length(distance),1,0.8), NA))
I cant see how to change rbinom(length(distance)
to be a single draw for row as opposed to the length of the dataframe/if else statement.
Subset of the data:
list(structure(c(775.056695476403, 1414.15314106691, 2509.95923787194,
1666.71143236238, 585.640129954299, 1169.17884175758, 152.505503148836,
619.226302243787, 1263.66546590149, 1682.8712425131, -2.86809018002943,
-2.87220511792857, -2.91236875367306, -2.91236875367306, -2.91137226768259,
-2.91236875367306, -2.86275243787543, -2.8606012634912, -2.86264610888995,
-2.86004943151114, 58.2523804031471, 58.2594633464797, 58.1998311185373,
58.1998311185373, 58.1999333186371, 58.1998311185373, 58.243480631029,
58.2359999509482, 58.2407966146843, 58.2335609045358, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1), .Dim = c(10L, 4L), .Dimnames = list(NULL,
c("distance", "lon", "lat", "ID"))), structure(c(775.056695476403,
1414.15314106691, 2509.95923787194, 1666.71143236238, 585.640129954299,
1169.17884175758, 152.505503148836, 619.226302243787, 1263.66546590149,
1682.8712425131, -2.86809018002943, -2.87220511792857, -2.91236875367306,
-2.91236875367306, -2.91137226768259, -2.91236875367306, -2.86275243787543,
-2.8606012634912, -2.86264610888995, -2.86004943151114, 58.2523804031471,
58.2594633464797, 58.1998311185373, 58.1998311185373, 58.1999333186371,
58.1998311185373, 58.243480631029, 58.2359999509482, 58.2407966146843,
58.2335609045358, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), .Dim = c(10L,
4L), .Dimnames = list(NULL, c("distance", "lon", "lat", "ID"))))
CodePudding user response:
Well your rbinom
produces i.i.d.
random variables, so your function is correct. A way to verify would be the following snippet:
set.seed(12123)
n <- 10000
rowSums( # [3]
(mat <- replicate(n, # [2]
rbinom(10, 1, 0.8) # [1]
))
) / n
# [1] 0.8004 0.7979 0.8025 0.8033 0.7974 0.7988 0.7984 0.7993 0.7990 0.8013
cor(t(mat))
# [,1] [,2] [,3] [,4] [,5] [...]
# [1,] 1.0000000000 0.0028711704 0.0036386366 -0.0003859466 0.0097167804 [...]
# [...]
Explanation
- Draw 10 bernoulli random variables
- Repeat this 10000 times (data is then organized as a
10 x 10000
matrix with repetitions in columns and the 10 independent variables in rows) - Take the average or each row. As we drew from a bernoulli with
p = .8
we would expect an average of around.8
which the reuslts show. - If we look at the correlation between the 10 observations, we see that those are all very close to
0
, so they are independent.