I am a beginner in R and I have been trying to build my first multiple linear regression model. In this model, I am trying to know whether the dependent variable meanf0 values are different across VowelPositions(independent variable) in just disyllabic words (disyllabic words are found in the SyllableCount variable, which contains two levels: disyllabic and trisyllabic) and in a specific SyllabicType"open"(SyllabicType is another independent predictor variable that contains two levels: "open" and "closed"). I am stuck on how to build a model with just some categories of given independent variable if that is possible? here is my tentative model:
model_F0_disyll <- lm (data=QP1_subset_norm,
meanf0_norm~SyllableCount syllableType VowelPosition,
subset(SyllableCount=="2" & syllableType=="open"))
but it does seem to work. Thank you in advance for your guidance!
CodePudding user response:
I think you would need something like:
model_F0_disyll <- lm (data=QP1_subset_norm,
meanf0_norm~SyllableCount syllableType VowelPosition,
subset = (SyllableCount==2 & syllableType=="open"))
The subset
argument is just the expression used to make the subset, you don't need to call the subset()
function. Further, when the variable is numeric (presumably like SyllableCount
, you could use either a numeric or string value. That is SyllableCount == "2"
and SyllableCount == 2
both work.
Here's an example with the mtcars
data:
mod <- lm(mpg ~ hp wt, data=mtcars, subset=(am == "1" & cyl == 4))
summary(mod)
#>
#> Call:
#> lm(formula = mpg ~ hp wt, data = mtcars, subset = (am == "1" &
#> cyl == 4))
#>
#> Residuals:
#> Datsun 710 Fiat 128 Honda Civic Toyota Corolla Fiat X1-9
#> -2.66851 4.18787 -2.61455 3.25523 -2.62538
#> Porsche 914-2 Lotus Europa Volvo 142E
#> -0.77799 1.17181 0.07154
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 47.24552 6.57304 7.188 0.000811 ***
#> hp -0.07288 0.05695 -1.280 0.256814
#> wt -6.46508 3.15205 -2.051 0.095512 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 3.193 on 5 degrees of freedom
#> Multiple R-squared: 0.6378, Adjusted R-squared: 0.493
#> F-statistic: 4.403 on 2 and 5 DF, p-value: 0.07893
Created on 2022-06-16 by the reprex package (v2.0.1)
CodePudding user response:
I tried the code based on your suggestion but I got an error, which says:
model_F0_disyll <- lm (meanf0_norm~SyllableCount VowelPosition,data=QP1_subset_norm_1, subset=(SyllableCount=="2"))
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 isOF[nn]]) :
contrasts can be applied only to factors with 2 or more levels
```r
And then when I tried the summary() I got this where the subset () was ignored by R I guess:
```r
Call:
lm(formula = meanf0_norm ~ VowelPosition SyllableCount, data = QP1_subset_norm)
Residuals:
Min 1Q Median 3Q Max
-3.4119 -0.6489 -0.1001 0.6089 11.6656
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.46669 0.01973 23.659 <2e-16 ***
VowelPositionpen -0.40440 0.03285 -12.309 <2e-16 ***
VowelPositionfi -0.99317 0.02347 -42.323 <2e-16 ***
SyllableCount3 -0.05940 0.02333 -2.546 0.0109 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.8925 on 7033 degrees of freedom
(19 observations deleted due to missingness)
Multiple R-squared: 0.2032, Adjusted R-squared: 0.2029
F-statistic: 597.9 on 3 and 7033 DF, p-value: < 2.2e-16
```r
<sup>Created on 2022-06-16 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup>