Suppose that I have this dataset that shows how many items (i.e., Item A, B, and C) are sold in three stores (i.e., Store 1, 2, and 3) between year 2020 to 2021. The dataset looks similar to the table below:
Initial_Data = data.frame(Year= c(2020, 2020, 2020, 2021, 2021, 2021, 2020, 2020, 2020, 2021, 2021, 2021,2020, 2020, 2020, 2021, 2021, 2021),
Stores = c('Store 1', 'Store 1', 'Store 1', 'Store 1', 'Store 1', 'Store 1', 'Store 2', 'Store 2', 'Store 2', 'Store 2', 'Store 2', 'Store 2', 'Store 3', 'Store 3', 'Store 3', 'Store 3', 'Store 3', 'Store 3'),
Items = c('A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'),
`Units Sold` = c(67, 32, 11, 0, 0, 0, 54, 0, 9, 71, 36, 12, 0, 0, 0, 45, 28, 7)
)
Let's say that there has been a glitch in the system that caused some stores to report 0 units sold for ALL items (for example, store 1 in 2021 and store 2 in 2020 reported 0 units sold for all three items). So I want to "flag/identify" these stores to investigate them further. Note that I am only interested in stores that reported zero units sold for ALL THREE ITEMS. For example, store 2 in 2020 reports 0 units sold for item B, but I do not care about that.
So the resulting table should look something like:
Final_Data = data.frame(Year= c(2020, 2020, 2020, 2021, 2021, 2021, 2020, 2020, 2020, 2021, 2021, 2021,2020, 2020, 2020, 2021, 2021, 2021),
Stores = c('Store 1', 'Store 1', 'Store 1', 'Store 1', 'Store 1', 'Store 1', 'Store 2', 'Store 2', 'Store 2', 'Store 2', 'Store 2', 'Store 2', 'Store 3', 'Store 3', 'Store 3', 'Store 3', 'Store 3', 'Store 3'),
Items = c('A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'C'),
`Units Sold` = c(67, 32, 11, 0, 0, 0, 54, 0, 9, 71, 36, 12, 0, 0, 0, 45, 28, 7),
Check = c('Fine', 'Fine', 'Fine', 'Check', 'Check', 'Check', 'Fine', 'Fine', 'Fine', 'Fine', 'Fine', 'Fine', 'Check', 'Check', 'Check', 'Fine', 'Fine', 'Fine')
)
The actual dataset that I'm using represents dozens of stores with more than 10 items and covers multiple years. So it's impossible to check them manually like the example above. That being said, it would be great if your solution could be as "flexible" or "general" as possible. That is, it shouldn't be too "rigid" that it cannot be modified at all and can only be used to address the table above.
Thank you!
CodePudding user response:
Seems like you just need you need to group the df by the year and store and then run some logic. Try:
library(dplyr)
Initial_Data |>
group_by(Stores, Year) |>
mutate(Check = if(all(Units.Sold == 0)) "Check" else "Fine")