Home > front end >  Using dplyr rewrite a loop that shifts a value in a vector
Using dplyr rewrite a loop that shifts a value in a vector

Time:01-09

Here's simple loop in base R:

# prep
x <- sort(round(10 * rnorm(10)))
res.sd <- NULL
res.var <- NULL
res.mad <- NULL
#loop
for(i in -20:20){
 x[10] <- i
 res.sd <- c(res.sd, sd(x))
 res.var <- c(res.var, var(x))
 res.mad <- c(res.mad, mad(x))
}

I would like to rewrite this in dplyr

CodePudding user response:

We can use a combination of purrr and tibble.

library(tidyverse)

set.seed(0)

x <- sort(round(10 * rnorm(10)))

map_dfr(-20:20, ~ tibble(
  res.sd = sd(c(x[-10], .)),
  res.var = var(c(x[-10], .)),
  res.mad = mad(c(x[-10], .))
))
#> # A tibble: 41 × 3
#>    res.sd res.var res.mad
#>     <dbl>   <dbl>   <dbl>
#>  1   11.7    138.    15.6
#>  2   11.6    134.    15.6
#>  3   11.4    130.    15.6
#>  4   11.2    126.    15.6
#>  5   11.1    122.    15.6
#>  6   10.9    119.    15.6
#>  7   10.8    116.    14.8
#>  8   10.6    113.    14.1
#>  9   10.5    110.    13.3
#> 10   10.4    108.    12.6
#> # … with 31 more rows

Created on 2022-01-09 by the reprex package (v2.0.1)

CodePudding user response:

If I am understanding your code correctly, it does the following:

  1. Generate 10 random numbers stored in x.
  2. In a loop that iterates over -20:20, replace the 10th value of x with the iterated value.
  3. Calculate the SD, variance, and median absolute deviation of the modified vector, and store these calculations.

As ekoam points out, this type of operation is ill-suited to dplyr's intended purpose. That said, the ability to store list-columns makes this possible (albeit inefficient, since it requires storing multiple copies of the x vector). The following will produce equivalent results to your code, if you add set.seed(0) before the first line to control randomization.

set.seed(0)
df <- tibble(
  x = list(sort(round(10 * rnorm(10)))),
  y = -20:20
) %>% 
  rowwise() %>% 
  mutate(
    res.sd = sd(c(x[-10], y)),
    res.var = var(c(x[-10], y)),
    res.mad = mad(c(x[-10], y))
  )

# A tibble: 41 × 5
# Rowwise: 
   x              y res.sd res.var res.mad
   <list>     <int>  <dbl>   <dbl>   <dbl>
 1 <dbl [10]>   -20   11.7    138.    15.6
 2 <dbl [10]>   -19   11.6    134.    15.6
 3 <dbl [10]>   -18   11.4    130.    15.6
 4 <dbl [10]>   -17   11.2    126.    15.6
 5 <dbl [10]>   -16   11.1    122.    15.6
 6 <dbl [10]>   -15   10.9    119.    15.6
 7 <dbl [10]>   -14   10.8    116.    14.8
 8 <dbl [10]>   -13   10.6    113.    14.1
 9 <dbl [10]>   -12   10.5    110.    13.3
10 <dbl [10]>   -11   10.4    108.    12.6

Alternately, we could get a little clever with the lapply and sapply, and then store the result in a tibble. Note that there is almost no repeated code here:

set.seed(0)
x <- sort(round(10 * rnorm(10)))
y <- -20:20
lapply(list(sd = sd, var = var, mad = mad), function(func) {
  
  sapply(y, function(j) {
    func(c(x[-10], j))
  }) 
}) %>% 
  as_tibble()

# A tibble: 41 × 3
      sd   var   mad
   <dbl> <dbl> <dbl>
 1  11.7  138.  15.6
 2  11.6  134.  15.6
 3  11.4  130.  15.6
 4  11.2  126.  15.6
 5  11.1  122.  15.6
 6  10.9  119.  15.6
 7  10.8  116.  14.8
 8  10.6  113.  14.1
 9  10.5  110.  13.3
10  10.4  108.  12.6
# … with 31 more rows
  •  Tags:  
  • Related