Suppose I have a data frame as follows:
date | price | company |
---|---|---|
2000-10-01 | 18 | A |
2001-10-01 | 20 | A |
2001-10-01 | 20 | A |
2001-10-01 | 20 | A |
I want to create a new variable lagged_price
as follows:
date | price | company | lagged_price |
---|---|---|---|
2000-10-01 | 18 | A | NA |
2001-10-01 | 20 | A | 18 |
2001-10-01 | 20 | A | 18 |
2001-10-01 | 20 | A | 18 |
The new variable, lagged_price
, takes the lagged value of price
for group company
. That is, lagged_price
captures the price for the company on a previous date
. Using group_by
is problematic since it captures the value in the preceding row of the group company
. Instead, I want to capture the lagged price on the previous date for that company. I also do not want to perform distinct()
on the original dataset. Although that does the job in this example, I still want to keep other rows.
my failed solution:
out <- data %>%
group_by(company) %>%
mutate(lagged_price = lag(price))
Any help is appreciated.
CodePudding user response:
Lagging before grouping gives
df %>%
mutate(lagged_price = lag(price)) %>%
group_by(date) %>%
mutate(lagged_price = lagged_price[1]) %>%
ungroup()
# A tibble: 4 × 4
date price company lagged_price
<chr> <int> <chr> <int>
1 2000-10-01 18 A NA
2 2001-10-01 20 A 18
3 2001-10-01 20 A 18
4 2001-10-01 20 A 18