Problem: Two data frames each containing three columns but different number of rows
>view(archae_pro)
motif obs pred
AAB 1189 760.1757
CDD 1058 249.7147
DDE 771 415.1314
FBB 544 226.3529
>view(archae_end)
motif obs pred
ABG 1044 749.4967
GBC 634 564.5753
AGG 616 568.7375
CGG 504 192.5312
BTT 404 200.4589
I want to perform chi-square goodness-of-fit test. Also, calculate standardised residuals and column-bind them to the corresponding data frames. What I tried follows:
df.list <- list (
df_archae_pro,
df_archae_end,
)
prop <- lapply(df.list, function (x) cbind(x$pred/sum(x$pred)))
chisquare <- lapply(df.list, function(x) chisq.test (x$obs, p=prop))
Rstudio throws up an error
Error in chisq.test(x$obs, p = prop) :
'x' and 'p' must have the same number of elements
My two-pence on the error: chisq.test somehow does not read the "prop" corresponding to the correct data.frame?!
I have just started learning rstudio a few days ago so do excuse any obvious mistakes.
I would also appreciate any help in calculating the standardized residuals and column-binding them to the data frames.
CodePudding user response:
Simply, add a first argument to cbind
with named argument for new column, prop
, on second argument while assigning result back to df.list
since you are adding a new column to each data frame.
Then, in next call add an object qualifier, x$
, to prop
to reference column in test:
df.list <- lapply(df.list, function(x)
cbind(x, prop=x$pred/sum(x$pred))
)
chisquare <- lapply(df.list, function(x)
chisq.test(x$obs, p=x$prop)
)
To assign results of test, cbind
the extracted values:
df.list <- lapply(df.list, function(df) {
results <- chisq.test(df$obs, p=df$prop)
cbind(
df,
stat = results$statistic,
pval = results$p.value,
stdres = results$stdres
)
})