Home > Blockchain >  Create a loop that appends columns with leading variables
Create a loop that appends columns with leading variables

Time:09-22

My data frame contains the following columns: date, id, category. I want to create a loop which appends additional columns (category1, category2, etc.) , containing leading categories for each id by date.

My code without a loop looks like this:

df <- df %>%
  dplyr::group_by(id) %>%
  dplyr::mutate(category1 = dplyr::lead(category, n = 1L, order_by = date))

df <- df %>%
  dplyr::group_by(id) %>%
  dplyr::mutate(category2 = dplyr::lead(category, n = 2L, order_by = date))

df <- df %>%
  dplyr::group_by(id) %>%
  dplyr::mutate(category3 = dplyr::lead(category, n = 3L, order_by = date))

#and so on.

What would be the best way to create this loop?

Thank you!

CodePudding user response:

Using data.table:

# example data
dt <- mtcars[1:10, c("cyl", "disp", "mpg")]

setDT(dt)[ order(mpg), c("cat1", "cat2") := lapply(1:2, function(i) shift(disp, i, type = "lead")), by = cyl ]

dt
#     cyl  disp  mpg  cat1  cat2
#  1:   6 160.0 21.0 160.0 258.0
#  2:   6 160.0 21.0 258.0    NA
#  3:   4 108.0 22.8 140.8 146.7
#  4:   6 258.0 21.4    NA    NA
#  5:   8 360.0 18.7    NA    NA
#  6:   6 225.0 18.1 167.6 160.0
#  7:   8 360.0 14.3 360.0    NA
#  8:   4 146.7 24.4    NA    NA
#  9:   4 140.8 22.8 146.7    NA
# 10:   6 167.6 19.2 160.0 160.0

CodePudding user response:

We can create the desired lead functions with map and partial from purrr, and apply them all at once with the mutate(across()) pattern.

library(purrr)
library(dplyr)

df %>% 
  group_by(id) %>% 
  mutate(across(
    category, 
    map(1:7, ~partial(lead, n = .x, order_by = date)),
    .names = "{.col}{.fn}"
  ))
# A tibble: 100 × 10
# Groups:   id [4]
      id category date       category1 category2 category3 category4 category5
   <int> <chr>    <date>     <chr>     <chr>     <chr>     <chr>     <chr>    
 1     1 B        2022-01-02 B         B         C         B         C        
 2     1 B        2022-01-03 B         C         B         C         C        
 3     1 B        2022-01-05 C         B         C         C         C        
 4     1 C        2022-01-06 B         C         C         C         B        
 5     1 B        2022-01-10 C         C         C         B         C        
 6     1 C        2022-01-10 C         C         B         C         A        
 7     1 C        2022-01-10 C         B         C         A         C        
 8     1 C        2022-01-11 B         C         A         C         B        
 9     1 B        2022-01-12 C         A         C         B         B        
10     1 C        2022-01-17 A         C         B         B         A        
11     1 A        2022-01-18 C         B         B         A         B        
12     1 C        2022-01-20 B         B         A         B         A        
13     1 B        2022-01-25 B         A         B         A         C        
14     1 B        2022-01-25 A         B         A         C         C        
15     1 A        2022-01-25 B         A         C         C         C        
16     1 B        2022-01-25 A         C         C         C         C        
17     1 A        2022-01-26 C         C         C         C         B        
18     1 C        2022-01-27 C         C         C         B         A        
19     1 C        2022-01-30 C         C         B         A         B        
20     1 C        2022-02-01 C         B         A         B         C        
21     1 C        2022-02-04 B         A         B         C         C        
22     1 B        2022-02-06 A         B         C         C         A        
23     1 A        2022-02-06 B         C         C         A         C        
24     1 B        2022-02-06 C         C         A         C         NA       
25     1 C        2022-02-11 C         A         C         NA        NA       
26     1 C        2022-02-12 A         C         NA        NA        NA       
27     1 A        2022-02-12 C         NA        NA        NA        NA       
28     1 C        2022-02-14 NA        NA        NA        NA        NA       
29     2 B        2022-01-06 B         C         C         A         A        
# … with 71 more rows, and 2 more variables: category6 <chr>, category7 <chr>

(Where df is sample data defined as:

n <- 100
set.seed(123) ; df <- data.frame(
  id = sample(1:4, n, replace = TRUE),
  category = sample(LETTERS[1:3], n, replace = TRUE),
  date = sample(seq(as.Date("2022-01-01"), by = "1 day", length.out = 45), n,
                replace = TRUE)
)
df <- df %>% arrange(id, date)
  • Related