I would like to create for loop to repeat the same function for 150 variables. I am new to R and I am a bit stuck.
To give you an example of some commands I need to repeat:
N <- table(df$ var1 ==0)["TRUE"]
n <- table(df$ var1 ==1)["TRUE"]
PREV95 <- (svyciprop(~ var1 ==1, level=0.95, design= design, deff= "replace")*100)
I need to run the same functions for 150 columns. I know that I need to put all my cols in one vector = x but then I don't know how to write the loop to repeat the same command for all my variables.
Can anyone help me to write a loop?
CodePudding user response:
A word in advance: loops in R can in most cases be replaced with a faster, R-ish way (various flavours of apply
, map
ing, walk
ing ...)
applying a function to the columns of dataframe df:
a)
with base R, example dataset cars
my_function <- function(xs) max(xs)
lapply(cars, my_function)
b) tidyverse-style:
cars %>%
summarise_all(my_function)
An anecdotal example: I came across an R-script which took about half an hour to complete and made abundant use of for
-loops. Replacing the loops with vectorized functions and members of the apply
family cut the execution time down to about 3 minutes. So while for
-loops and related constructs might be more familiar when coming from another language, they might soon get in your way with R.
This chapter of Hadley Wickham's R for data science gives an introduction into iterating "the R-way".
CodePudding user response:
Here is an approach that doesn't use loops. I've created a data set called df with three factor variables to represent your dataset as you described it. I created a function eval() that does all the work. First, it filters out just the factors. Then it converts your factors to numeric variables so that the numbers can be summed as 0 and 1 otherwise if we sum the factors it would be based on 1 and 2. Within the function I create another function neg() to give you the number of negative values by subtracting the sum of the 1s from the total length of the vector. Then create the dataframes "n" (sum of the positives), "N" (sum of the negatives), and PREV95. Note I had to leave PREV95 out because I do not have a 'design' object to use as a parameter to run the function. I hashed it out but you can remove the hash to add back in. I then used rbind to combine these dataframes and return "results". Again, I've hashed out the version that you'd use to include PREV95. The function eval() takes your original dataframe as input. I think the logic for PREV95 should work, but I cannot check it without a 'design' parameter. It returns a dataframe, not a list, which you'll likely find easier to work with. Note I didn't use a seed so the results will change every time you rerun the code to make df.
df <- data.frame(Var1 = factor(sample(c(0,1), 10, TRUE)),
Var2 = factor(sample(c(0,1), 10, TRUE)),
Var3 = factor(sample(c(0,1), 10, TRUE)))
eval <- function(df){
df1 <- df %>%
select_if(is.factor) %>%
mutate_all(function(x) as.numeric(as.character(x)))
neg <- function(x){
length(x) - sum(x)
}
n<- df1 %>%
summarize(across(where(is.numeric), sum))
N <- df1 %>%
summarize(across(where(is.numeric), function(x) neg(x)))
#PREV95 <- df1 %>%
# summarize(across(where(is.numeric), function(x) survey::svyciprop(~x == 1, design = design, level = 0.95, deff = "replace")*100))
#results <- rbind(n = n, N = N, PREV95)
results <- (rbind(n = n, N = N))
return(results)
}
eval(df)
# Var1 Var2 Var3
#n 7 6 3
#N 3 4 7