Home > Net >  How to run a set of functions over given list of variable names and write the output to a table?
How to run a set of functions over given list of variable names and write the output to a table?

Time:05-20

What I'm seeking to do is run a mean/standard deviation calculation, as well as a statistical test, along a set of variables. What seems right to do is build the function such that one can pass the list of column names through the function.

One possibly complicating factor is that for this specific data frame, it requires certain functions relating to survey data.

library(radiant.data) #for weighted.sd
library(survey) #survey functions
library(srvyr) #survey functions

#building a df
df <- data.frame("GroupingFactor" = c(1, 1, 0, 0),
                 "VarofInterest1" = c(1, 1, 1, 0),
                 "VarofInterest2" = c(1, 0, 0, 0),
                 "PSU" = c(1, 2, 1, 2),
                 "SAMPWEIGHT" = c(0, 23254, 343, 5652),
                 "STRATA" = c(6133, 6131, 6145, 6152))

options(survey.adjust.domain.lonely=TRUE) #adjusting for the one PSU
options(survey.lonely.psu="adjust")

svy <- svydesign(~PSU, weights = ~SAMPWEIGHT, strata = ~STRATA, data = df, nest = TRUE, check.strata = FALSE) #the design

#here is what i would like to iterate

df %>% 
  group_by(GroupingFactor) %>% 
  summarise(mean = weighted.mean(VarofInterest1, SAMPWEIGHT, na.rm =T), sd = weighted.sd(VarofInterest1, SAMPWEIGHT, na.rm =T)) #for mean and SD

svychisq(~GroupingFactor VarofInterest1, svy, statistic = 'Chisq') #the test of interest

Everything AFTER creating the svy object is what I'd ideally automate across a list of variables, e.g., applied to a list including VarofInterest2, a VarofInterest3, and so on.

The final product is a table/tibble including all the variable names, each one's mean and standard deviation and the output of the Chi-squared test (e.g., test statistic/X-squared and p-value).

I would also take a reference for doing this on non-survey weighted data! (i.e., just running, say, a dozen t-tests using a similar premise of feeding a list of variables you'd like to run the t-test against with a grouping factor).

Edit: Intended output

GroupingFactor Mean SD Statistic p Variable
0 .25 .25 341.14 .014 VarofInterest1
1 .50 .00 N/A N/A VarofInterest1

OR separate functions/table generating functions, one of just the means/SDs:

GroupingFactor Mean SD Variable
0 .50 .25 VarofInterest1
1 .25 .00 VarofInterest1

and then a second with the test statistic and p-values:

Variable Statistic p
VarofInterest1 4131.11 .001
VarofInterest2 131.14 .131

CodePudding user response:

You can write a function f() that takes the data, the group variable, and the variable of interest, and return the statistics.. You would need to modify the below example for survey data, but this might give you starting point.

f <- function(df, g, v) {
  
  v_string = quo_name(enquo(v))
  g_string = quo_name(enquo(v))
  
  chi_result = chisq.test(df[[v_string]], df[[g_string]])
  
   df %>% 
    group_by({{g}}) %>% 
    summarize(Mean = mean({{v}}, na.rm=T),SD = sd({{v}}, na.rm=T)) %>% 
    mutate(variable=v_string,
           statistic=chi_result$statistic,
           pvalue=chi_result$p.value)
}


bind_rows(
  lapply(c("VarofInterest1", "VarofInterest2"),\(i) f(df,GroupingFactor,!!sym(i)))
)

Output:

# A tibble: 4 × 6
  GroupingFactor  Mean    SD variable       statistic pvalue
           <dbl> <dbl> <dbl> <chr>              <dbl>  <dbl>
1              0   0.5 0.707 VarofInterest1     0.444  0.505
2              1   1   0     VarofInterest1     0.444  0.505
3              0   0   0     VarofInterest2     0.444  0.505
4              1   0.5 0.707 VarofInterest2     0.444  0.505
  • Related