I'm doing an assignment in R where I need to take a data frame with multiple variables and create a function() that resamples the absolute mean differences between the two categories in the data frame.
For the sake of my question I'll add an example data frame:
Variable 1 | Variable 2 | Variable 3 | Category |
---|---|---|---|
1 | 2 | 3 | 1 |
4 | 5 | 6 | 1 |
7 | 8 | 9 | 2 |
10 | 11 | 12 | 2 |
The function needs to accept three arguments: a numeric vector, the two categories within the data frame, and nsim (number of times to resample randomly). The output should be a vector of length nsim with the resampled absolute mean differences.
This is the function I've tried, but when testing the output is always "Nan".
setseed(12345)
test<-function(x, category1, category2, nsim){
resampled<-sample(df, size=length(nrow(df)), replace=F)
category1.mean<-sum(df$x[resampled=="category1"])/length(df$x[resampled=="category1"])
category2.mean<-sum(df$x[resampled=="category2"])/length(df$x[resampled=="category2"])
return(abs(category1.mean-category2.mean)}
I'm not sure if I'm misunderstanding anything based on how function() works or if I'm misunderstanding the question or the data but I've tried a few things to try to fix the Nan output without success.
Can anyone help me out?
CodePudding user response:
The code below uses replicate
to run nsim
times the resampling and calculations function f
.
x<-'Variable1 Variable2 Variable3 Category
1 2 3 1
4 5 6 1
7 8 9 2
10 11 12 2'
df1 <- read.table(textConnection(x), header = TRUE)
test <- function(data, x, category1, category2, nsim){
f <- function(data, x, category1, category2) {
i <- sample(nrow(data), replace = TRUE)
d <- data[i, ]
j1 <- which(d[["Category"]] == category1)
j2 <- which(d[["Category"]] == category2)
v1 <- d[j1, x, drop = TRUE]
v2 <- d[j2, x, drop = TRUE]
diff_means <- if(length(v1) == 0 & length(v2) == 0) {
NaN
} else if(length(v1) == 0) {
mean(v2)
} else if(length(v2) == 0) {
mean(v1)
} else mean(v1) - mean(v2)
abs(diff_means)
}
replicate(nsim, f(data, x, category1, category2))
}
set.seed(2022)
# amd: absolute mean differences
amd <- test(df1, "Variable1", 1, 2, nsim = 1e3)
hist(amd)
Created on 2022-05-26 by the reprex package (v2.0.1)