Alright so lets say we have a data frame that looks like this:
Month Product sales sales_lag YoY_change chart_color
2021-12-01 Tshirt 82525 108748 -0.2411 Negative
2022-01-01 Tshirt 109411 138472 -0.2099 Negative
2022-02-01 Tshirt 106934 115497 -0.0741 Negative
2021-12-01 Pants 97863 78419 0.2480 Positive
2022-01-01 Pants 103296 100614 0.0267 Positive
2022-02-01 Pants 82306 76913 0.0701 Positive
I'm trying to figure out a way to count the Product that has the longest run of months with negative YoY_change. So in this case, I'm looking for something that would return a table like this:
Negative_run
Tshirt 3
Pants 0
Telling us that Pants had positive YoY growth in the most recent month, but T-shirt is on a 3 Month negative streak. And while its not shown in this dataset, it would need to account for a Product being positive one month and negative the next. I'd like to do this within the tidyverse if possible, but open to other options in the R universe.
CodePudding user response:
df %>%
# calculate by product
group_by(Product) %>%
# find runs of numbers
mutate(g = data.table::rleid(chart_color)) %>%
# get the last run, and make sure that the last observation was negative
filter(g == max(g), last(YoY_change < 0), .preserve = TRUE) %>%
# count the length of the run
count()
# A tibble: 2 × 2 # Groups: Product [2] Product n <chr> <int> 1 Pants 0 2 Tshirt 3
CodePudding user response:
Since you want to count the number of consecutive Negative
, we will need some sort of rle
function.
library(dplyr)
df %>%
group_by(Product,
grp = with(rle(chart_color), rep(seq_along(lengths), lengths))) %>%
mutate(Negative_run = ifelse(chart_color == "Positive", 0, seq_along(grp))) %>%
group_by(Product) %>%
summarize(Negative_run = max(Negative_run))
# A tibble: 2 × 2
Product Negative_run
<chr> <dbl>
1 Pants 0
2 Tshirt 3
Data
df <- structure(list(Month = c("2021-12-01", "2022-01-01", "2022-02-01",
"2021-12-01", "2022-01-01", "2022-02-01"), Product = c("Tshirt",
"Tshirt", "Tshirt", "Pants", "Pants", "Pants"), sales = c(82525L,
109411L, 106934L, 97863L, 103296L, 82306L), sales_lag = c(108748L,
138472L, 115497L, 78419L, 100614L, 76913L), YoY_change = c(-0.2411,
-0.2099, -0.0741, 0.248, 0.0267, 0.0701), chart_color = c("Negative",
"Negative", "Negative", "Positive", "Positive", "Positive")), class = "data.frame", row.names = c(NA,
-6L))
CodePudding user response:
I would first create a column which would identify if the YOY change is negative or not and then aggregate.
df1$NegativeY = as.numeric(ifelse(df1$YoY_change>0, 0,1))
df1 %>%
group_by(Product) %>%
summarise(Negative_run = sum(NegativeY))