I'm having a problem with an efficient way to store the counts of a vector which is changing over time. In my problem I start with an empty vector of length n and by each iteration I add a number to this vector, but I also want to have some type of object that acts as a counter, so if the number that I add is already in the vector then it should add 1 to the object and if it's not then it should add the value as a "name" and set it to 1.
What I want is something analogous to Python, in which we can use numbers as keys and counts as values, so then I can access both separately with dict.keys()
and dict.values()
.
For example, if I get the values 1, 2, 1, 4 then I would like the object to update as:
> value count
1 1
> value count
1 1
2 1
> value count
1 2
2 1
> value count
1 2
2 1
4 1
and to access efficiently both values
and count
separately. I thought of using something like plyr::count
on the vector, but I don't think that it's efficient to count at every iteration, specially if n is really large.
Edit: In my problem it's necessary (well, maybe not) to update the counts at every iteration.
What I'm doing is simulating data from a Dirichlet Process using the Polya urn representation. For example, suppose that I have the vector (1.1, 0.2, 0.3, 1.1, 0.2), then to get a new data point one samples from a base distribution (for example a normal distribution) and adds that value with a certain probability, or adds a previous value with a probability proportional to the frequency of the value. With numbers:
- Add the sampled value with probability 1/6, or
- Add 1.1 with probability 2/6, or 0.2 with probability 2/6, or 0.3 with probability 1/6 (i.e. the probabilities are proportional to the frecuencies)
CodePudding user response:
The structure you are describing is produced by as.data.frame(table(vec))
. There is no need to update the counts as you go along, since calling this line will give you the updated counts
vec <- c(1, 2, 4, 1)
as.data.frame(table(vec))
#> vec Freq
#> 1 1 2
#> 2 2 1
#> 3 4 1
Suppose I now update vec
vec <- append(vec, c(1, 2, 4, 5))
We get the new counts the same way
as.data.frame(table(vec))
#> vec Freq
#> 1 1 3
#> 2 2 2
#> 3 4 2
#> 4 5 1
CodePudding user response:
Maybe you can use assign
and get0
of an environment to update the counts like:
x <- c(1, 2, 1, 4)
y <- new.env()
lapply(x, function(z) {
assign(as.character(z), get0(as.character(z), y, ifnotfound = 0) 1, y)
setNames(stack(mget(ls(y), y))[2:1], c("value", "count"))
})
#[[1]]
# value count
#1 1 1
#
#[[2]]
# value count
#1 1 1
#2 2 1
#
#[[3]]
# value count
#1 1 2
#2 2 1
#
#[[4]]
# value count
#1 1 2
#2 2 1
#3 4 1