I want to apply a custom function to every column of df
and assign the value that function returns to a new column in that dataframe.
My function takes a vector of values from chosen columns (in my case values from columns 12:17 will be used), and returns a calculated value (diversity index).
The function is defined as:
shannon <- function(p){
if (0 %in% p) {
p = replace(p,p==0,0.0001)
} else {
p
}
H = -sum(p*log(p))
return (H)
}
A random row from the dataset looks like this:
p <- df[3000,12:17]
x1 x2 x3 x4 x5 x6
0.5777778 0.1777778 0.1555556 0.2888889 0.02222222 0
When I apply the custom function to this row, like this:
shannon(as.vector(t(p)))
It returns the correctly calculated value of 1.357692
.
Now, I want to make this value into a new column of my dataset, by applying the custom function to the specific columns form my dataset. I try to do it using mutate
and sapply
by running:
df <- mutate(df, shannon = sapply(as.vector(t(census[,12:17])), shannon))
but it returns
Error in `mutate()`:
! Problem while computing `shannonVal = sapply(as.vector(t(census[, 12:17])), shannon)`.
✖ `shannonVal` must be size 9467 or 1, not 56802.
The number of rows in my dataset is 9467, so the sapply is returning something that's 6 times as long. But why, and how can I fix it?
CodePudding user response:
Building on Ric's comment, df <- mutate(df, shannon = apply(census[,12:17], 1, function(x) {shannon(t(x)})
might just do the trick
CodePudding user response:
Ric's answer works: df$shannon <- apply(df[,12:17], 1, shannon)
df
and census
are the same thing, sorry for the confusion