I am a biologist. I am trying to learn R scripting. The first code is from a datacamp course to calculate the sum of log probabilities. I wrote the second, which is obviously wrong. I know the output is different. I just want to understand why we needed to add the "[i]"
#code1
n <- 100
total <- 0
x <- runif(n)
for(i in 1:n)
total <- total log(x[i])
total
#code2
n <- 100
total <- 0
x <- runif(n)
for(i in 1:n)
total <- total log(x)
total
A better way to solve this is
log_sum <- sum(log(x))
I just want to understand the difference between the 2 codes.
CodePudding user response:
Here x
is a vector of 100 separate values, not just one value.
x[i]
is a way of denoting "the i
th element in x". x[1]
is the first element, x[2]
the second etc.
This section of code:
for(i in 1:n)
total <- total log(x[i])
says "starting from one, and counting up by one each iteration until I we reach the number defined by n
, do the following, with the current count being represented by i
:
Take the value of total
, add to it the log
of i
th element of the object x
, and store that as the new value of total
.
In other words, when you run that it goes through the loop 100 times and each iteration it takes the log
of one of the values of x and adds it to the running total.
In some programming languages, this would be the main or most obvious way to do this. In R, there is a concept called "vectorisation" which is quite important.
In those programming languages, if you ran log(x)
and x
contains 100 values you may receive an error message - the log()
function is expecting a single value, it doesn't know what to do with 100 of them. Thanks to vectorisation, when you pass R's log()
function a vector of 100 values it happily returns a vector containing 100 logs - the equivalent of running log()
on each of those values in turn and assembling them into a new vector.
Let's look at your version:
for(i in 1:n)
total <- total log(x)
R treats this as though it says "starting from one, and counting up by one each iteration until I we reach the number defined by n
, do the following, with the current count being represented by i
:
Take the value of total
, add to it the log
of every single element of the object x
, and store that as the new value of total
.
You can see that this loop will run 100 times, and each time add ALL the logs of values of x to the running total - we will end up with a result 100 times bigger than expected.
This is also the reason this works:
log_sum <- sum(log(x))
This says "take the log
of all the elements of x
in turn, sum
them, and assign the result to the variable log_sum
I hope that helps.