Home > Back-end >  Draw bernouli outcome from ifelse statement on list of dataframes
Draw bernouli outcome from ifelse statement on list of dataframes

Time:10-14

I am trying to draw a 1 or 0 from a bernouli distribution for each row within a list when the value in the first column exceeds 1000.

I believe my current code is drawing a distribution for each dataframe in the list as opposed to doing it for each row. Is there a way I can confirm this? for each row where distance is >1000 I want to draw from the bernouli distribution 1 or 0. each row has its own chance of being 0 or 1

mylistnew<-lapply(mylist, transform, outcome = ifelse(distance > 1000, 
rbinom(length(distance),1,0.8), NA))

I cant see how to change rbinom(length(distance) to be a single draw for row as opposed to the length of the dataframe/if else statement.

Subset of the data:

list(structure(c(775.056695476403, 1414.15314106691, 2509.95923787194, 
1666.71143236238, 585.640129954299, 1169.17884175758, 152.505503148836, 
619.226302243787, 1263.66546590149, 1682.8712425131, -2.86809018002943, 
-2.87220511792857, -2.91236875367306, -2.91236875367306, -2.91137226768259, 
-2.91236875367306, -2.86275243787543, -2.8606012634912, -2.86264610888995, 
-2.86004943151114, 58.2523804031471, 58.2594633464797, 58.1998311185373, 
58.1998311185373, 58.1999333186371, 58.1998311185373, 58.243480631029, 
58.2359999509482, 58.2407966146843, 58.2335609045358, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1), .Dim = c(10L, 4L), .Dimnames = list(NULL, 
    c("distance", "lon", "lat", "ID"))), structure(c(775.056695476403, 
1414.15314106691, 2509.95923787194, 1666.71143236238, 585.640129954299, 
1169.17884175758, 152.505503148836, 619.226302243787, 1263.66546590149, 
1682.8712425131, -2.86809018002943, -2.87220511792857, -2.91236875367306, 
-2.91236875367306, -2.91137226768259, -2.91236875367306, -2.86275243787543, 
-2.8606012634912, -2.86264610888995, -2.86004943151114, 58.2523804031471, 
58.2594633464797, 58.1998311185373, 58.1998311185373, 58.1999333186371, 
58.1998311185373, 58.243480631029, 58.2359999509482, 58.2407966146843, 
58.2335609045358, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), .Dim = c(10L, 
4L), .Dimnames = list(NULL, c("distance", "lon", "lat", "ID"))))

CodePudding user response:

Well your rbinom produces i.i.d. random variables, so your function is correct. A way to verify would be the following snippet:

set.seed(12123)
n <- 10000
rowSums(                        # [3]
  (mat <- replicate(n,          # [2]
             rbinom(10, 1, 0.8) # [1]
  ))
) / n
# [1] 0.8004 0.7979 0.8025 0.8033 0.7974 0.7988 0.7984 0.7993 0.7990 0.8013

cor(t(mat))
#                [,1]          [,2]          [,3]          [,4]          [,5] [...]
#  [1,]  1.0000000000  0.0028711704  0.0036386366 -0.0003859466  0.0097167804 [...]
# [...]

Explanation

  1. Draw 10 bernoulli random variables
  2. Repeat this 10000 times (data is then organized as a 10 x 10000 matrix with repetitions in columns and the 10 independent variables in rows)
  3. Take the average or each row. As we drew from a bernoulli with p = .8 we would expect an average of around .8 which the reuslts show.
  4. If we look at the correlation between the 10 observations, we see that those are all very close to 0, so they are independent.
  • Related