Home > Mobile >  Counting negative values, stopping when there is a positive among most recent observations, all done
Counting negative values, stopping when there is a positive among most recent observations, all done

Time:05-05

Alright so lets say we have a data frame that looks like this:

Month       Product         sales   sales_lag YoY_change  chart_color
2021-12-01  Tshirt          82525   108748    -0.2411     Negative
2022-01-01  Tshirt          109411  138472    -0.2099     Negative
2022-02-01  Tshirt          106934  115497    -0.0741     Negative
2021-12-01  Pants           97863   78419      0.2480     Positive
2022-01-01  Pants           103296  100614     0.0267     Positive
2022-02-01  Pants           82306   76913      0.0701     Positive

I'm trying to figure out a way to count the Product that has the longest run of months with negative YoY_change. So in this case, I'm looking for something that would return a table like this:

          Negative_run
Tshirt    3
Pants     0

Telling us that Pants had positive YoY growth in the most recent month, but T-shirt is on a 3 Month negative streak. And while its not shown in this dataset, it would need to account for a Product being positive one month and negative the next. I'd like to do this within the tidyverse if possible, but open to other options in the R universe.

CodePudding user response:

df %>% 
  # calculate by product
  group_by(Product) %>% 
  # find runs of numbers
  mutate(g = data.table::rleid(chart_color)) %>% 
  # get the last run, and make sure that the last observation was negative
  filter(g == max(g), last(YoY_change < 0), .preserve = TRUE) %>% 
  # count the length of the run
  count()
# A tibble: 2 × 2
# Groups:   Product [2]
  Product     n
  <chr>   <int>
1 Pants       0
2 Tshirt      3

CodePudding user response:

Since you want to count the number of consecutive Negative, we will need some sort of rle function.

library(dplyr)

df %>% 
  group_by(Product, 
           grp = with(rle(chart_color), rep(seq_along(lengths), lengths))) %>%
  mutate(Negative_run = ifelse(chart_color == "Positive", 0, seq_along(grp))) %>% 
  group_by(Product) %>% 
  summarize(Negative_run = max(Negative_run))

# A tibble: 2 × 2
  Product Negative_run
  <chr>          <dbl>
1 Pants              0
2 Tshirt             3

Data

df <- structure(list(Month = c("2021-12-01", "2022-01-01", "2022-02-01", 
"2021-12-01", "2022-01-01", "2022-02-01"), Product = c("Tshirt", 
"Tshirt", "Tshirt", "Pants", "Pants", "Pants"), sales = c(82525L, 
109411L, 106934L, 97863L, 103296L, 82306L), sales_lag = c(108748L, 
138472L, 115497L, 78419L, 100614L, 76913L), YoY_change = c(-0.2411, 
-0.2099, -0.0741, 0.248, 0.0267, 0.0701), chart_color = c("Negative", 
"Negative", "Negative", "Positive", "Positive", "Positive")), class = "data.frame", row.names = c(NA, 
-6L))

CodePudding user response:

I would first create a column which would identify if the YOY change is negative or not and then aggregate.

df1$NegativeY = as.numeric(ifelse(df1$YoY_change>0, 0,1))

df1 %>%
  group_by(Product) %>%
  summarise(Negative_run = sum(NegativeY))

enter image description here

  • Related