I’m attempting to write a script that compares labels between periods for a group of customers by creating a new column, “Last_month” that shows last period’s label. If it’s the first time a customer shows up in the dataset then it should either say NA or “New” in the “Last_month” column. I have tried a variety of things based on replies I received to a related question (Comparing rows to see if a customer has switched products) a couple of years ago but I’m not able to set the label to NA or “New” in the first month.
If I only had to compare the last two periods I’d just use a left join comparing the two periods but here I have a large dataset covering many periods. Is there an easy way to do this? My best attempt plus desired outcome:
library(tidyverse)
# Example data
df <- tibble(Period = c(202201, 202202, 202202, 202202, 202201, 202203,202203),
CustomerID = c(1, 1, 2, 3, 2, 2, 1),
Tag = c("A", "A", "B", "C", "D", "C", "D"))
# My best result
df %>%
arrange(CustomerID, Period) %>%
group_by(CustomerID) %>%
mutate(Last_month = lag(Tag, default = Tag[1]))
# Desired outcome 1
df2 <- tibble(Period = c(202201, 202202, 202203, 202201, 202202, 202203,202202),
Customer = c(1, 1, 1, 2, 2, 2, 3),
Tag = c("A", "A", "D", "D", "B", "C", "C"),
Last_month = c("New", "A", "A", "New", "D", "B", "New"))
# Desired outcome 2
df3 <- tibble(Period = c(202201, 202202, 202203, 202201, 202202, 202203,202202),
Customer = c(1, 1, 1, 2, 2, 2, 3),
Tag = c("A", "A", "D", "D", "B", "C", "C"),
Last_month = c(NA, "A", "A", NA, "D", "B", NA))
CodePudding user response:
like this?
library(dplyr)
df %>%
arrange(CustomerID, Period) %>%
group_by(CustomerID) %>%
mutate(last_month = lag(Tag, 1))