Considering the following dataset:
Company name | Year | Customers |
---|---|---|
Company A | 2018 | 100 |
Company B | 2018 | 120 |
Company C | 2018 | 150 |
Company A | 2019 | 120 |
Company B | 2019 | 180 |
Company C | 2019 | 80 |
Company A | 2020 | 200 |
Company B | 2020 | 500 |
Company C | 2020 | 140 |
What I want to do is to measure the future return of the customers. So, I need to have the customer amount of next year in a new column. Something like this:
Company name | Year | Customers | Customers next year |
---|---|---|---|
Company A | 2018 | 100 | 120 |
Company B | 2018 | 120 | 180 |
Company C | 2018 | 150 | 80 |
Company A | 2019 | 120 | 200 |
Company B | 2019 | 180 | 500 |
Company C | 2019 | 80 | 140 |
Company A | 2020 | 200 | NA |
Company B | 2020 | 500 | NA |
Company C | 2020 | 140 | NA |
Does anybody have any idea how to do this?
CodePudding user response:
You can use lead
like in the following code:
library(dplyr)
df %>%
group_by(Company) %>%
mutate(customers_next_year = lead(Customers)) %>%
ungroup()
Output:
# A tibble: 9 × 4
# Groups: Company [3]
Company Year Customers customers_next_year
<chr> <dbl> <dbl> <dbl>
1 A 2018 100 120
2 B 2018 120 180
3 C 2018 150 80
4 A 2019 120 200
5 B 2019 180 500
6 C 2019 80 140
7 A 2020 200 NA
8 B 2020 500 NA
9 C 2020 140 NA
Data
df <- data.frame(Company = rep(c("A", "B", "C"), 3),
Year = c(2018, 2018, 2018, 2019, 2019, 2019, 2020, 2020, 2020),
Customers = c(100,120,150,120,180,80,200,500,140))
CodePudding user response:
A method without external pacakges:
within(df, {
customers_next_year <- ave(Customers, Company, FUN = \(x) c(x[-1], NA))
})
# Company Year Customers customers_next_year
# 1 A 2018 100 120
# 2 B 2018 120 180
# 3 C 2018 150 80
# 4 A 2019 120 200
# 5 B 2019 180 500
# 6 C 2019 80 140
# 7 A 2020 200 NA
# 8 B 2020 500 NA
# 9 C 2020 140 NA