I have written a function to output a stat description and histogram with a standard deviation curve for a column in a dataframe. I now want to use a loop to run this for all the columns in the dataframe, however I have gotten the below warning -
Warning: numerical expression has 99 elements: only the first usedWarning: numerical expression has 99 elements: only the first usedError in vec_as_location2_result()
:
! Can't extract columns past the end.
ℹ Location 849 doesn't exist.
ℹ There are only 25 columns.
Backtrace:
- global normality_test(kensington_data_plus_consumption, i)
- dplyr:::pull.data.frame(...)
- tidyselect::vars_pull(names(.data), !!enquo(var))
- tidyselect:::pull_as_location2(loc, n, vars)
- vctrs::num_as_location2(i, n = n, negative = "ignore", arg = "var")
- vctrs:::vec_as_location2_result(...) Error in vec_as_location2_result(i, n = n, names = NULL, negative = negative, :
Here is the code for the function and loop - Libraries used - tidyverse, ggplot, pastecs
test_data <- data.frame (a = c("E01002852", "E01002853", "E01002854", "E01002855", "E01002856", "E01002857", "E01002858"),
b = c(998, 715, 523, 755, 694, 510, 661),
c = c(2645303, 1844769, 1371527, 1853285, 2017993, 1492991, 1937841),
d = c(2659.604, 2580.096, 2622.423, 2907.771, 2927.434, 2931.681, 3357.934),
e = c(2004.55, 2121.30, 2100.10, 1942.30, 2285.55, 2103.50, 1999.20),
f = c(706, 319, 309, 644, 404, 443, 567)
)
normality_test <- function(data, col) {
col <- data %>% pull({{col}})
col_stat <-
stat.desc(col,
basic = FALSE,
desc = FALSE,
norm = TRUE
)
print(col_stat)
data %>%
ggplot(
aes(
x = col
)
)
geom_histogram(
aes(
y = ..density..
),
binwidth = 15
)
stat_function(
fun = dnorm,
args = list(
mean = col %>% mean(),
sd = col %>% sd()
),
colour = "red", size = 1
)
}
for (i in test_data$b:test_data$f) {
normality_test(test_data, i)
}
CodePudding user response:
You are initiating your loop incorrectly. If you just run:
test_data$b:test_data$f
You will see it gives an error, so cant initiate the list in the first place.
Warning messages: 1: In test_data$b:test_data$f : numerical expression has 7 elements: only the first used 2: In test_data$b:test_data$f : numerical expression has 7 elements: only the first used
You can first define your columns, then run the loop:
wantcols <- c("b", "c", "d", "e", "f")
for (i in wantcols) {
normality_test(test_data, i)
}
In this case, i
will iteratively take the value of each element in wantcols
.
Alternatively, as the comments mention, you could accomplish this more simply with lapply
:
lapply(wantcols, function(x) normality_test(test_data, x))
Also, if you want all the columns in your data but the first, you could do something easier to define your columns, such as:
wantcols <- names(test_data)[-1]
# [1] "b" "c" "d" "e" "f"