Home > other >  How to perform lag in R when there are multiple repeating rows for a group
How to perform lag in R when there are multiple repeating rows for a group

Time:02-01

Suppose I have a data frame as follows:

date price company
2000-10-01 18 A
2001-10-01 20 A
2001-10-01 20 A
2001-10-01 20 A

I want to create a new variable lagged_price as follows:

date price company lagged_price
2000-10-01 18 A NA
2001-10-01 20 A 18
2001-10-01 20 A 18
2001-10-01 20 A 18

The new variable, lagged_price, takes the lagged value of price for group company. That is, lagged_price captures the price for the company on a previous date. Using group_by is problematic since it captures the value in the preceding row of the group company. Instead, I want to capture the lagged price on the previous date for that company. I also do not want to perform distinct() on the original dataset. Although that does the job in this example, I still want to keep other rows.

my failed solution:

out <- data %>%
group_by(company) %>%
mutate(lagged_price = lag(price))

Any help is appreciated.

CodePudding user response:

Lagging before grouping gives

df %>% 
  mutate(lagged_price = lag(price)) %>% 
  group_by(date) %>% 
  mutate(lagged_price = lagged_price[1]) %>% 
  ungroup()
# A tibble: 4 × 4
  date       price company lagged_price
  <chr>      <int> <chr>          <int>
1 2000-10-01    18 A                 NA
2 2001-10-01    20 A                 18
3 2001-10-01    20 A                 18
4 2001-10-01    20 A                 18
  • Related