Using dplyr rewrite a loop that shifts a value in a vector-CodePudding

Here's simple loop in base R:

# prep
x <- sort(round(10 * rnorm(10)))
res.sd <- NULL
res.var <- NULL
res.mad <- NULL
#loop
for(i in -20:20){
 x[10] <- i
 res.sd <- c(res.sd, sd(x))
 res.var <- c(res.var, var(x))
 res.mad <- c(res.mad, mad(x))
}

I would like to rewrite this in dplyr

CodePudding user response：

We can use a combination of purrr and tibble.

library(tidyverse)

set.seed(0)

x <- sort(round(10 * rnorm(10)))

map_dfr(-20:20, ~ tibble(
  res.sd = sd(c(x[-10], .)),
  res.var = var(c(x[-10], .)),
  res.mad = mad(c(x[-10], .))
))
#> # A tibble: 41 × 3
#>    res.sd res.var res.mad
#>     <dbl>   <dbl>   <dbl>
#>  1   11.7    138.    15.6
#>  2   11.6    134.    15.6
#>  3   11.4    130.    15.6
#>  4   11.2    126.    15.6
#>  5   11.1    122.    15.6
#>  6   10.9    119.    15.6
#>  7   10.8    116.    14.8
#>  8   10.6    113.    14.1
#>  9   10.5    110.    13.3
#> 10   10.4    108.    12.6
#> # … with 31 more rows

^{Created on 2022-01-09 by the reprex package (v2.0.1)}

CodePudding user response：

If I am understanding your code correctly, it does the following:

Generate 10 random numbers stored in x.
In a loop that iterates over -20:20, replace the 10th value of x with the iterated value.
Calculate the SD, variance, and median absolute deviation of the modified vector, and store these calculations.

As ekoam points out, this type of operation is ill-suited to dplyr's intended purpose. That said, the ability to store list-columns makes this possible (albeit inefficient, since it requires storing multiple copies of the x vector). The following will produce equivalent results to your code, if you add set.seed(0) before the first line to control randomization.

set.seed(0)
df <- tibble(
  x = list(sort(round(10 * rnorm(10)))),
  y = -20:20
) %>% 
  rowwise() %>% 
  mutate(
    res.sd = sd(c(x[-10], y)),
    res.var = var(c(x[-10], y)),
    res.mad = mad(c(x[-10], y))
  )

# A tibble: 41 × 5
# Rowwise: 
   x              y res.sd res.var res.mad
   <list>     <int>  <dbl>   <dbl>   <dbl>
 1 <dbl [10]>   -20   11.7    138.    15.6
 2 <dbl [10]>   -19   11.6    134.    15.6
 3 <dbl [10]>   -18   11.4    130.    15.6
 4 <dbl [10]>   -17   11.2    126.    15.6
 5 <dbl [10]>   -16   11.1    122.    15.6
 6 <dbl [10]>   -15   10.9    119.    15.6
 7 <dbl [10]>   -14   10.8    116.    14.8
 8 <dbl [10]>   -13   10.6    113.    14.1
 9 <dbl [10]>   -12   10.5    110.    13.3
10 <dbl [10]>   -11   10.4    108.    12.6

Alternately, we could get a little clever with the lapply and sapply, and then store the result in a tibble. Note that there is almost no repeated code here:

set.seed(0)
x <- sort(round(10 * rnorm(10)))
y <- -20:20
lapply(list(sd = sd, var = var, mad = mad), function(func) {
  
  sapply(y, function(j) {
    func(c(x[-10], j))
  }) 
}) %>% 
  as_tibble()

# A tibble: 41 × 3
      sd   var   mad
   <dbl> <dbl> <dbl>
 1  11.7  138.  15.6
 2  11.6  134.  15.6
 3  11.4  130.  15.6
 4  11.2  126.  15.6
 5  11.1  122.  15.6
 6  10.9  119.  15.6
 7  10.8  116.  14.8
 8  10.6  113.  14.1
 9  10.5  110.  13.3
10  10.4  108.  12.6
# … with 31 more rows