I'd like to write a function with two inputs (x and y) to create some mutated variables in a very large dataframe. Specifically, if x=y then return x, and if x!=y then draw 1 sample from a sequence of x to y.
The function works fine when I test it outside of my datafarme, but throws an error when I try to use it within mutate. I've tried both ifelse and if_else versions.
library(dplyr)
smx <- function(x,y){ #Function to allow sampling if length>1
if_else(x==y,x,sample(seq(x,y,1),1))}
#ifelse(x==y,x,sample(seq(x,y,1),1))} #Have also tried this with ifelse, doesn't work
smx(0,0) #This works
smx(0,5) #This works
#Create dummy data frame
df <- as.data.frame(cbind(c(rep(0,5)),c(seq(0,4,1))))
colnames(df) <- c("varA","varB")
df
#This doesn't work
df1 <- df %>% mutate(
VarC = smx(varA,varB)
)
Ideally, my output should include a third column (VarC) in which the first row is equal to 0 (because varA=varB) and the remaining rows are a random sample between a sequence from varA to varB.
I have set up my data frame so that varA is always be smaller than varB, but I'm not certain. Appreciate any help on a clean solution to this problem!
CodePudding user response:
The function is not working because it is not vectorized
. First, you'll need to vectorized your function, in order to make it work inside mutate
.
You can do that as follows:
vectorized_fun <- Vectorize(your_fun)
Your code will look like this:
smx_v <- Vectorize(smx)
#This works
df1 <- df %>%
mutate(VarC = smx_v(varA,varB)
)
CodePudding user response:
The issue here comes from seq
: when using this function inside dplyr
verb, you need to make sure the length of input is 1, which isn't the case here.
Using rowwise()
solves the problem:
smx <- function(x,y){
ifelse(x==y,x,sample(seq(x,y,1),1))
}
df <- as.data.frame(cbind(c(rep(0,5)),c(seq(0,4,1))))
colnames(df) <- c("varA","varB")
df %>%
rowwise() %>%
mutate(VarC = smx(varA, varB))
Output:
# A tibble: 5 x 3
# Rowwise:
varA varB VarC
<dbl> <dbl> <dbl>
1 0 0 0
2 0 1 1
3 0 2 1
4 0 3 2
5 0 4 0