How to run a set of functions over given list of variable names and write the output to a table?-CodePudding

What I'm seeking to do is run a mean/standard deviation calculation, as well as a statistical test, along a set of variables. What seems right to do is build the function such that one can pass the list of column names through the function.

One possibly complicating factor is that for this specific data frame, it requires certain functions relating to survey data.

library(radiant.data) #for weighted.sd
library(survey) #survey functions
library(srvyr) #survey functions

#building a df
df <- data.frame("GroupingFactor" = c(1, 1, 0, 0),
                 "VarofInterest1" = c(1, 1, 1, 0),
                 "VarofInterest2" = c(1, 0, 0, 0),
                 "PSU" = c(1, 2, 1, 2),
                 "SAMPWEIGHT" = c(0, 23254, 343, 5652),
                 "STRATA" = c(6133, 6131, 6145, 6152))

options(survey.adjust.domain.lonely=TRUE) #adjusting for the one PSU
options(survey.lonely.psu="adjust")

svy <- svydesign(~PSU, weights = ~SAMPWEIGHT, strata = ~STRATA, data = df, nest = TRUE, check.strata = FALSE) #the design

#here is what i would like to iterate

df %>% 
  group_by(GroupingFactor) %>% 
  summarise(mean = weighted.mean(VarofInterest1, SAMPWEIGHT, na.rm =T), sd = weighted.sd(VarofInterest1, SAMPWEIGHT, na.rm =T)) #for mean and SD

svychisq(~GroupingFactor VarofInterest1, svy, statistic = 'Chisq') #the test of interest

Everything AFTER creating the svy object is what I'd ideally automate across a list of variables, e.g., applied to a list including VarofInterest2, a VarofInterest3, and so on.

The final product is a table/tibble including all the variable names, each one's mean and standard deviation and the output of the Chi-squared test (e.g., test statistic/X-squared and p-value).

I would also take a reference for doing this on non-survey weighted data! (i.e., just running, say, a dozen t-tests using a similar premise of feeding a list of variables you'd like to run the t-test against with a grouping factor).

Edit: Intended output

GroupingFactor	Mean	SD	Statistic	p	Variable
0	.25	.25	341.14	.014	VarofInterest1
1	.50	.00	N/A	N/A	VarofInterest1

OR separate functions/table generating functions, one of just the means/SDs:

GroupingFactor	Mean	SD	Variable
0	.50	.25	VarofInterest1
1	.25	.00	VarofInterest1

and then a second with the test statistic and p-values:

Variable	Statistic	p
VarofInterest1	4131.11	.001
VarofInterest2	131.14	.131

CodePudding user response：

You can write a function f() that takes the data, the group variable, and the variable of interest, and return the statistics.. You would need to modify the below example for survey data, but this might give you starting point.

f <- function(df, g, v) {
  
  v_string = quo_name(enquo(v))
  g_string = quo_name(enquo(v))
  
  chi_result = chisq.test(df[[v_string]], df[[g_string]])
  
   df %>% 
    group_by({{g}}) %>% 
    summarize(Mean = mean({{v}}, na.rm=T),SD = sd({{v}}, na.rm=T)) %>% 
    mutate(variable=v_string,
           statistic=chi_result$statistic,
           pvalue=chi_result$p.value)
}


bind_rows(
  lapply(c("VarofInterest1", "VarofInterest2"),\(i) f(df,GroupingFactor,!!sym(i)))
)

Output:

# A tibble: 4 × 6
  GroupingFactor  Mean    SD variable       statistic pvalue
           <dbl> <dbl> <dbl> <chr>              <dbl>  <dbl>
1              0   0.5 0.707 VarofInterest1     0.444  0.505
2              1   1   0     VarofInterest1     0.444  0.505
3              0   0   0     VarofInterest2     0.444  0.505
4              1   0.5 0.707 VarofInterest2     0.444  0.505