Data
Here is the dput for my example data:
work <- structure(list(Mins_Work = c(435L, 350L, 145L, 135L, 15L, 60L,
60L, 390L, 395L, 395L, 315L, 80L, 580L, 175L, 545L, 230L, 435L,
370L, 255L, 515L, 330L, 65L, 115L, 550L, 420L, 45L, 266L, 196L,
198L, 220L, 17L, 382L, 0L, 180L, 343L, 207L, 263L, 332L, 0L,
0L, 259L, 417L, 282L, 685L, 517L, 111L, 64L, 466L, 499L, 460L,
269L, 300L, 427L, 301L, 436L, 342L, 229L, 379L, 102L, 146L, NA,
94L, 345L, 73L, 204L, 512L, 113L, 135L, 458L, 493L, 552L, 108L,
335L, 395L, 508L, 546L, 396L, 159L, 325L, 747L, 650L, 377L, 461L,
669L, 186L, 220L, 410L, 708L, 409L, 515L, 413L, 166L, 451L, 660L,
177L, 192L, 191L, 461L, 637L, 297L), Coffee_Cups = c(3L, 0L,
2L, 6L, 4L, 5L, 3L, 3L, 2L, 2L, 3L, 1L, 1L, 3L, 2L, 2L, 0L, 1L,
1L, 4L, 4L, 3L, 0L, 1L, 3L, 0L, 0L, 0L, 0L, 2L, 0L, 1L, 2L, 3L,
2L, 2L, 4L, 3L, 6L, 6L, 3L, 4L, 6L, 8L, 3L, 5L, 0L, 2L, 2L, 8L,
6L, 4L, 6L, 4L, 4L, 2L, 6L, 6L, 5L, 1L, 3L, 1L, 5L, 4L, 6L, 5L,
0L, 6L, 6L, 4L, 4L, 2L, 2L, 6L, 6L, 7L, 3L, 3L, 0L, 5L, 7L, 6L,
3L, 5L, 3L, 3L, 1L, 9L, 9L, 3L, 3L, 6L, 6L, 6L, 3L, 0L, 7L, 6L,
6L, 3L), Work_Environment = c("Office", "Office", "Office", "Home",
"Home", "Office", "Office", "Office", "Office", "Office", "Home",
"Home", "Office", "Office", "Office", "Home", "Office", "Home",
"Home", "Office", "Office", "Home", "Office", "Home", "Home",
"Home", "Office", "Office", "Office", "Office", "Home", "Home",
"Home", "Office", "Office", "Office", "Office", "Office", "Home",
"Home", "Office", "Office", "Home", "Home", "Office", "Home",
"Home", "Office", "Office", "Home", "Home", "Office", "Home",
"Home", "Office", "Office", "Home", "Office", "Home", "Home",
"Home", "Home", "Office", "Home", "Office", "Office", "Home",
"Home", "Office", "Office", "Home", "Home", "Office", "Office",
"Home", "Office", "Office", "Home", "Office", "Office", "Home",
"Home", "Office", "Office", "Home", "Home", "Office", "Home",
"Home", "Office", "Office", "Home", "Office", "Office", "Home",
"Home", "Office", "Home", "Home", "Home")), class = "data.frame", row.names = c(NA,
-100L))
Problem
When I run imputations on my normal dataset:
imp.work <- work %>%
mice(m=5)
imp.work
There seems to be no problem generating the mids
object requested:
Class: mids
Number of multiple imputations: 5
Imputation methods:
Mins_Work Coffee_Cups Work_Environment
"pmm" "" ""
PredictorMatrix:
Mins_Work Coffee_Cups Work_Environment
Mins_Work 0 1 0
Coffee_Cups 1 0 0
Work_Environment 1 1 0
Number of logged events: 1
it im dep meth out
1 0 0 constant Work_Environment
However, if I transform my data into scaled data and run the same imputations:
scale.work <- work %>%
mutate(Scale_Cups = scale(Coffee_Cups))
imp.scale <- scale.work %>%
mice(m=5)
It gives me this error:
Error in check.dataform(data) :
Cannot handle columns with class matrix: Scale_Cups
I'm assuming this is because the scaled data cannot have missing data imputed (by nature of being scaled). However, I'm not sure what to do about this. Can anybody offer solutions?
CodePudding user response:
As the error message says:
Cannot handle columns with class matrix: Scale_Cups
Is because scale()
returns a matrix. This can be confirmed by calling class()
on the Scale_Cups variable. It's a matrix with only 1 column, but it's still a matrix.
class(scale.work$Scale_Cups)
#> [1] "matrix" "array"
Because there's only 1 column you can easily convert the new scaled data into a vector, then mice()
will work.
scale.work <- work %>%
mutate(Scale_Cups = as.vector(scale(Coffee_Cups)))
class(scale.work$Scale_Cups)
#> [1] "numeric"
Note however, that the new scaled vector is collinear with the existing Coffee_Cups vector, so you will get a warning message. Best would be to also remove the unscaled vector before running mice()
.
scale.work$Coffee_Cups <- NULL
imp.scale <- scale.work %>%
mice(m=5)
There is no significant difference whether you run the imputations first and then scale after like in your answer or scale before imputation like in this answer. For other non-linear transformations there would be a difference.
CodePudding user response:
It looks like I figured out the problem by just bypassing scaling in the data frame and simply using it in the test itself, such as below:
fit <- with(imp.work,
lm(Mins_Work
~ scale(Coffee_Cups)))
summary(fit)
Which gives me the output I desire:
# A tibble: 10 × 6
term estimate std.error statistic p.value nobs
<chr> <dbl> <dbl> <dbl> <dbl> <int>
1 (Intercept) 315. 17.6 17.9 1.33e-32 100
2 scale(Coffee_Cups) 58.7 17.7 3.31 1.30e- 3 100
3 (Intercept) 319. 17.5 18.2 3.19e-33 100
4 scale(Coffee_Cups) 57.9 17.6 3.29 1.40e- 3 100
5 (Intercept) 319. 17.5 18.2 3.16e-33 100
6 scale(Coffee_Cups) 57.9 17.6 3.29 1.38e- 3 100
7 (Intercept) 320. 17.6 18.2 3.70e-33 100
8 scale(Coffee_Cups) 57.7 17.7 3.26 1.51e- 3 100
9 (Intercept) 316. 17.5 18.0 7.11e-33 100
10 scale(Coffee_Cups) 58.5 17.6 3.32 1.27e- 3 100