I have a dataframe with the following columns: Time, Drug and mobility for a full 24 hour period.
A Snapshot of the dataframe
Time Drug Mobility
18 A 1.2
19 A 1.3
20 A 1.3
21 A 1.2
18 B 3.2
19 B 3.2
20 B 3.3
21 B 3.3
I then perform a two-way anova to compare the effects of the drugs at each time point on mobility using this code:
mod = ols('Mobility~Time Drug Time*Drug', data = fdf).fit()
aov = anova_lm(mod, type=2)
Then i want to do a multicomparison test (post-hoc), particulary sidak but not sure how to do it. Struggling to find any resources online to learn from.
I know i can use tukey's test but it compares everything and I'm also only interest in the drug effects at the same timepoints:
18 A - 18 B
19 A - 19 B
20 A - 20 B
Not:
18 A - 19 B
18 A - 20 B
20 A - 18 A
Any assistance will help tremendously
CodePudding user response:
If you are only interested in the comparison within group, you already have them in your coefficients.
Using an example dataset:
import statsmodels.formula.api as sm
import pandas as pd
import numpy as np
from statsmodels.stats.anova import anova_lm
from statsmodels.stats.multitest import multipletests
fdf = pd.DataFrame({'Time':np.random.choice([18,19,20],50),
'Drug':np.random.choice(['A','B'],50),
'Mobility':np.random.uniform(0,1,50)})
fdf['Time'] = pd.Categorical(fdf['Time'])
mod = sm.ols('Mobility~Time Drug Time:Drug', data = fdf).fit()
aov = anova_lm(mod, type=2)
Results look like this:
mod.summary()
OLS Regression Results
==============================================================================
Dep. Variable: Mobility R-squared: 0.083
Model: OLS Adj. R-squared: -0.021
Method: Least Squares F-statistic: 0.7994
Date: Wed, 08 Dec 2021 Prob (F-statistic): 0.556
Time: 07:13:14 Log-Likelihood: -4.4485
No. Observations: 50 AIC: 20.90
Df Residuals: 44 BIC: 32.37
Df Model: 5
Covariance Type: nonrobust
========================================================================================
coef std err t P>|t| [0.025 0.975]
----------------------------------------------------------------------------------------
Intercept 0.4063 0.094 4.323 0.000 0.217 0.596
Time[T.19] 0.1622 0.149 1.091 0.281 -0.137 0.462
Time[T.20] -0.0854 0.133 -0.643 0.524 -0.353 0.182
Drug[T.B] 0.0046 0.149 0.031 0.975 -0.295 0.304
Time[T.19]:Drug[T.B] -0.1479 0.206 -0.717 0.477 -0.564 0.268
Time[T.20]:Drug[T.B] 0.2049 0.199 1.028 0.310 -0.197 0.607
In this case, Time=18 is the reference, so Drug[T.B]
would be the effect of B wrt to A, at Time=18, which is the result for 18 B - 18 A
, Time[T.19]:Drug[T.B]
would be 19 B - 19 A
and Time[T.20]:Drug[T.B]
would be 20 B - 20 A
.
For multiple comparisons adjustment, you can simply pull out these results and calculate the corrected pvalues:
res = pd.concat([mod.params,mod.pvalues],axis=1)
res.columns=['coefficient','pvalues']
res = res[res.index.str.contains('Drug')]
res['corrected_p'] = multipletests(res['pvalues'],method="sidak")[1]
res['comparison'] = ['18 B - 18 A','19 B - 19 A','20 B - 20 A']
coefficient pvalues corrected_p comparison
Drug[T.B] 0.004630 0.975284 0.999985 18 B - 18 A
Time[T.19]:Drug[T.B] -0.147928 0.477114 0.857038 19 B - 19 A
Time[T.20]:Drug[T.B] 0.204925 0.309616 0.670942 20 B - 20 A
See also help page for multipletests