Home > Software engineering >  Efficient way to iteratively store counts in R
Efficient way to iteratively store counts in R

Time:08-31

I'm having a problem with an efficient way to store the counts of a vector which is changing over time. In my problem I start with an empty vector of length n and by each iteration I add a number to this vector, but I also want to have some type of object that acts as a counter, so if the number that I add is already in the vector then it should add 1 to the object and if it's not then it should add the value as a "name" and set it to 1.

What I want is something analogous to Python, in which we can use numbers as keys and counts as values, so then I can access both separately with dict.keys() and dict.values().

For example, if I get the values 1, 2, 1, 4 then I would like the object to update as:

> value count
      1     1
> value count
      1     1
      2     1
> value count
      1     2
      2     1
> value count
      1     2
      2     1
      4     1

and to access efficiently both values and count separately. I thought of using something like plyr::count on the vector, but I don't think that it's efficient to count at every iteration, specially if n is really large.

Edit: In my problem it's necessary (well, maybe not) to update the counts at every iteration.

What I'm doing is simulating data from a Dirichlet Process using the Polya urn representation. For example, suppose that I have the vector (1.1, 0.2, 0.3, 1.1, 0.2), then to get a new data point one samples from a base distribution (for example a normal distribution) and adds that value with a certain probability, or adds a previous value with a probability proportional to the frequency of the value. With numbers:

  • Add the sampled value with probability 1/6, or
  • Add 1.1 with probability 2/6, or 0.2 with probability 2/6, or 0.3 with probability 1/6 (i.e. the probabilities are proportional to the frecuencies)

CodePudding user response:

The structure you are describing is produced by as.data.frame(table(vec)). There is no need to update the counts as you go along, since calling this line will give you the updated counts

vec <- c(1, 2, 4, 1)

as.data.frame(table(vec))
#>   vec Freq
#> 1   1    2
#> 2   2    1
#> 3   4    1

Suppose I now update vec

vec <- append(vec, c(1, 2, 4, 5))

We get the new counts the same way

as.data.frame(table(vec))
#>   vec Freq
#> 1   1    3
#> 2   2    2
#> 3   4    2
#> 4   5    1

CodePudding user response:

Maybe you can use assign and get0 of an environment to update the counts like:

x <- c(1, 2, 1, 4)

y <- new.env()
lapply(x, function(z) {
  assign(as.character(z), get0(as.character(z), y, ifnotfound = 0)   1, y)
  setNames(stack(mget(ls(y), y))[2:1], c("value", "count"))
})
#[[1]]
#  value count
#1     1     1
#
#[[2]]
#  value count
#1     1     1
#2     2     1
#
#[[3]]
#  value count
#1     1     2
#2     2     1
#
#[[4]]
#  value count
#1     1     2
#2     2     1
#3     4     1
  • Related