Home > Net >  what is the difference between the two pieces of code that calculate the sum of log probabilities?
what is the difference between the two pieces of code that calculate the sum of log probabilities?

Time:01-06

I am a biologist. I am trying to learn R scripting. The first code is from a datacamp course to calculate the sum of log probabilities. I wrote the second, which is obviously wrong. I know the output is different. I just want to understand why we needed to add the "[i]"

#code1
n <- 100
total <- 0
x <- runif(n)
for(i in 1:n)
total <- total   log(x[i])
total
#code2
n <- 100
total <- 0
x <- runif(n)
for(i in 1:n)
total <- total   log(x)
total

A better way to solve this is

log_sum <- sum(log(x)) 

I just want to understand the difference between the 2 codes.

CodePudding user response:

Here x is a vector of 100 separate values, not just one value.

x[i] is a way of denoting "the ith element in x". x[1] is the first element, x[2] the second etc.

This section of code:

for(i in 1:n)
  total <- total   log(x[i])

says "starting from one, and counting up by one each iteration until I we reach the number defined by n, do the following, with the current count being represented by i: Take the value of total, add to it the log of ith element of the object x, and store that as the new value of total.

In other words, when you run that it goes through the loop 100 times and each iteration it takes the log of one of the values of x and adds it to the running total.

In some programming languages, this would be the main or most obvious way to do this. In R, there is a concept called "vectorisation" which is quite important.

In those programming languages, if you ran log(x) and x contains 100 values you may receive an error message - the log() function is expecting a single value, it doesn't know what to do with 100 of them. Thanks to vectorisation, when you pass R's log() function a vector of 100 values it happily returns a vector containing 100 logs - the equivalent of running log() on each of those values in turn and assembling them into a new vector.

Let's look at your version:

for(i in 1:n)
  total <- total   log(x)

R treats this as though it says "starting from one, and counting up by one each iteration until I we reach the number defined by n, do the following, with the current count being represented by i: Take the value of total, add to it the log of every single element of the object x, and store that as the new value of total.

You can see that this loop will run 100 times, and each time add ALL the logs of values of x to the running total - we will end up with a result 100 times bigger than expected.

This is also the reason this works:

log_sum <- sum(log(x)) 

This says "take the log of all the elements of x in turn, sum them, and assign the result to the variable log_sum

I hope that helps.

  •  Tags:  
  • r
  • Related