Why don't scaled variables work with imputations in mice package?-CodePudding

Data

Here is the dput for my example data:

work <- structure(list(Mins_Work = c(435L, 350L, 145L, 135L, 15L, 60L, 
60L, 390L, 395L, 395L, 315L, 80L, 580L, 175L, 545L, 230L, 435L, 
370L, 255L, 515L, 330L, 65L, 115L, 550L, 420L, 45L, 266L, 196L, 
198L, 220L, 17L, 382L, 0L, 180L, 343L, 207L, 263L, 332L, 0L, 
0L, 259L, 417L, 282L, 685L, 517L, 111L, 64L, 466L, 499L, 460L, 
269L, 300L, 427L, 301L, 436L, 342L, 229L, 379L, 102L, 146L, NA, 
94L, 345L, 73L, 204L, 512L, 113L, 135L, 458L, 493L, 552L, 108L, 
335L, 395L, 508L, 546L, 396L, 159L, 325L, 747L, 650L, 377L, 461L, 
669L, 186L, 220L, 410L, 708L, 409L, 515L, 413L, 166L, 451L, 660L, 
177L, 192L, 191L, 461L, 637L, 297L), Coffee_Cups = c(3L, 0L, 
2L, 6L, 4L, 5L, 3L, 3L, 2L, 2L, 3L, 1L, 1L, 3L, 2L, 2L, 0L, 1L, 
1L, 4L, 4L, 3L, 0L, 1L, 3L, 0L, 0L, 0L, 0L, 2L, 0L, 1L, 2L, 3L, 
2L, 2L, 4L, 3L, 6L, 6L, 3L, 4L, 6L, 8L, 3L, 5L, 0L, 2L, 2L, 8L, 
6L, 4L, 6L, 4L, 4L, 2L, 6L, 6L, 5L, 1L, 3L, 1L, 5L, 4L, 6L, 5L, 
0L, 6L, 6L, 4L, 4L, 2L, 2L, 6L, 6L, 7L, 3L, 3L, 0L, 5L, 7L, 6L, 
3L, 5L, 3L, 3L, 1L, 9L, 9L, 3L, 3L, 6L, 6L, 6L, 3L, 0L, 7L, 6L, 
6L, 3L), Work_Environment = c("Office", "Office", "Office", "Home", 
"Home", "Office", "Office", "Office", "Office", "Office", "Home", 
"Home", "Office", "Office", "Office", "Home", "Office", "Home", 
"Home", "Office", "Office", "Home", "Office", "Home", "Home", 
"Home", "Office", "Office", "Office", "Office", "Home", "Home", 
"Home", "Office", "Office", "Office", "Office", "Office", "Home", 
"Home", "Office", "Office", "Home", "Home", "Office", "Home", 
"Home", "Office", "Office", "Home", "Home", "Office", "Home", 
"Home", "Office", "Office", "Home", "Office", "Home", "Home", 
"Home", "Home", "Office", "Home", "Office", "Office", "Home", 
"Home", "Office", "Office", "Home", "Home", "Office", "Office", 
"Home", "Office", "Office", "Home", "Office", "Office", "Home", 
"Home", "Office", "Office", "Home", "Home", "Office", "Home", 
"Home", "Office", "Office", "Home", "Office", "Office", "Home", 
"Home", "Office", "Home", "Home", "Home")), class = "data.frame", row.names = c(NA, 
-100L))

Problem

When I run imputations on my normal dataset：

imp.work <- work %>% 
  mice(m=5)
imp.work

There seems to be no problem generating the mids object requested:

Class: mids
Number of multiple imputations:  5 
Imputation methods:
       Mins_Work      Coffee_Cups Work_Environment 
           "pmm"               ""               "" 
PredictorMatrix:
                 Mins_Work Coffee_Cups Work_Environment
Mins_Work                0           1                0
Coffee_Cups              1           0                0
Work_Environment         1           1                0
Number of logged events:  1 
  it im dep     meth              out
1  0  0     constant Work_Environment

However, if I transform my data into scaled data and run the same imputations:

scale.work <- work %>% 
  mutate(Scale_Cups = scale(Coffee_Cups))

imp.scale <- scale.work %>% 
  mice(m=5)

It gives me this error:

Error in check.dataform(data) : 
  Cannot handle columns with class matrix: Scale_Cups

I'm assuming this is because the scaled data cannot have missing data imputed (by nature of being scaled). However, I'm not sure what to do about this. Can anybody offer solutions?

CodePudding user response：

As the error message says:

Cannot handle columns with class matrix: Scale_Cups

Is because scale() returns a matrix. This can be confirmed by calling class() on the Scale_Cups variable. It's a matrix with only 1 column, but it's still a matrix.

class(scale.work$Scale_Cups)
#> [1] "matrix" "array"

Because there's only 1 column you can easily convert the new scaled data into a vector, then mice() will work.

scale.work <- work %>% 
  mutate(Scale_Cups = as.vector(scale(Coffee_Cups)))
class(scale.work$Scale_Cups)
#> [1] "numeric"

Note however, that the new scaled vector is collinear with the existing Coffee_Cups vector, so you will get a warning message. Best would be to also remove the unscaled vector before running mice().

scale.work$Coffee_Cups <- NULL
imp.scale <- scale.work %>% 
  mice(m=5)

There is no significant difference whether you run the imputations first and then scale after like in your answer or scale before imputation like in this answer. For other non-linear transformations there would be a difference.

CodePudding user response：

It looks like I figured out the problem by just bypassing scaling in the data frame and simply using it in the test itself, such as below:

fit <- with(imp.work,
            lm(Mins_Work
               ~ scale(Coffee_Cups)))
summary(fit)

Which gives me the output I desire:

# A tibble: 10 × 6
   term               estimate std.error statistic  p.value  nobs
   <chr>                 <dbl>     <dbl>     <dbl>    <dbl> <int>
 1 (Intercept)           315.       17.6     17.9  1.33e-32   100
 2 scale(Coffee_Cups)     58.7      17.7      3.31 1.30e- 3   100
 3 (Intercept)           319.       17.5     18.2  3.19e-33   100
 4 scale(Coffee_Cups)     57.9      17.6      3.29 1.40e- 3   100
 5 (Intercept)           319.       17.5     18.2  3.16e-33   100
 6 scale(Coffee_Cups)     57.9      17.6      3.29 1.38e- 3   100
 7 (Intercept)           320.       17.6     18.2  3.70e-33   100
 8 scale(Coffee_Cups)     57.7      17.7      3.26 1.51e- 3   100
 9 (Intercept)           316.       17.5     18.0  7.11e-33   100
10 scale(Coffee_Cups)     58.5      17.6      3.32 1.27e- 3   100