I have sales data by years and models share column which represents the share of each model's sale within the specific year
model Year sales Share
1 A 2015 266 6.81%
2 B 2015 177 4.53%
3 C 2015 888 22.7%
...................
59 I 2020 5 0.84%
60 J 2020 1789 94.5%
Next, I calculate the cumulative share of each model within the specific year and apply these flowing rules:
I take the last year (2020) and order models by cumulative share (sum of shares), when this cumulative share exceeds the 90% threshold, the rest of the models are grouped as a new category called "insignificant". However, if the share of a single model is already more than 90%, all the data (models) are categorised as 'insignificant and the output is like this below, because share of model J in 2020 is more than 90%.
Year model sales Share pos Sumshare
<dbl> <chr> <dbl> <dbl> <int> <dbl>
1 2015 insignificant 3906 100 1 100
2 2016 insignificant 6434 100 1 100
3 2017 insignificant 3967 100 1 100
4 2018 insignificant 5658 100 1 100
5 2019 insignificant 35316 100 1 100
6 2020 insignificant 1892 100 1 100
Now I want to update my IF statement and add one more rule:
leave the condition as it is, but in a special case if a single model's share is more than 90%, leave this model and all the rest aggregate and categorise as "insignificant".
In that case, output gonna be only 2 categories: model 'J' and 'insignificant'
(I need to make minor changes into my code)
Code:
df <- data.frame (model = c("A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J","A","B","C","D","E","F","G","H","I","J"),
Year = c(2015,2015,2015,2015,2015,2015,2015,2015,2015,2015,2016,2016,2016,2016,2016,2016,2016,2016,2016,2016,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2018,2018,2018,2018,2018,2018,2018,2018,2018,2018,2019,2019,2019,2019,2019,2019,2019,2019,2019,2019,2020,2020,2020,2020,2020,2020,2020,2020,2020,2020),
sales = c(266,177,888,233,255,677,411,344,122,533,234,567,234,567,232,900,1005,1900,450,345,567,235,456,345,144,333,555,777,111,444,222,223,445,776,331,788,980,1003,456,434,345,2222,3456,456,678,8911,4560,4567,4566,5555,4,18,16,4,17,11,24,4,5,1789))
df2 <- df %>%
group_by(Year) %>% mutate(Share = 100 * sales/ sum(sales),
order = order(order(-Share))) %>% arrange(Year, order, by_group = TRUE) %>%
mutate(Sumshare= cumsum(Share)) %>%ungroup() %>%
mutate(threshold.90 = model %in% model[Year == max(Year) & Sumshare < 90]) %>%
mutate(model = ifelse(threshold.90, model, 'insignificant')) %>%
group_by(Year, model) %>% summarize(sales = sum(sales), Share = sum(Share), .groups = 'keep') %>%
group_by(Year) %>% mutate(pseudoShare = ifelse(model == 'insignificant', 0, Share)) %>%
arrange(Year, -pseudoShare, by_group = TRUE) %>%
ungroup() %>%
mutate(pos = match(model, model[Year == max(Year)])) %>%
select(-pseudoShare) %>%
group_by(Year) %>% mutate(Share = 100 * sales/ sum(sales)) %>% arrange(Year, -pos, by_group = TRUE) %>%
mutate(Sumshare = cumsum(Share)) %>%ungroup()
CodePudding user response:
I wrote a condensed version of your script
df |>
group_by(Year) |>
mutate(share = sales/sum(sales),
model = ifelse(Year == 2020 & row_number() >= min(which(cumsum(share) > .9)), model, "Insignificant")) |>
group_by(Year, model) |>
summarise(share = sum(share) * 100, model = model, sales = sum(sales), .groups = "drop" ) |>
distinct()
Year model share sales
<dbl> <chr> <dbl> <dbl>
1 2015 Insignificant 100 3906
2 2016 Insignificant 100 6434
3 2017 Insignificant 100 3967
4 2018 Insignificant 100 5658
5 2019 Insignificant 100 35316
6 2020 Insignificant 5.44 103
7 2020 J 94.6 1789
Your code can be cleaned up a bit, a lot of mutate calls and group by calls.
You can condense your mutate calls into one parenthesis such as: mutate(a = ..., b = ...)
CodePudding user response:
You can link two (or more) conditions together with the logical &&
-operator.
F.e.
(1 == 1) && (2 == 2)
[1] TRUE
(1 == 1) && (1 == 2)
[1] FALSE
If you want to apply more complex conditions (or combinations of them), you can use brackets, similar to mathematics. Fyi, the logical or would be ||
.
Hope that helps.
On a said note, I would suggest to split your code into multiple variables / lines as this is a very complex way of writing a relatively easy code.