I would like to order my dataset according to the production levels in 2018-19. So company_code 3 goes first (5000), then company_code 1 (2000) and then company_code 2 (1000).
I have a dataset like this:
company_code | financial_year | production |
---|---|---|
1 | 2018-19 | 2000 |
1 | 2019-20 | 2500 |
1 | 2020-21 | 3000 |
1 | 2018-21 | 7500 |
2 | 2018-19 | 1000 |
2 | 2019-20 | 1500 |
2 | 2020-21 | 1000 |
2 | 2020-21 | 3500 |
3 | 2018-19 | 5000 |
3 | 2019-20 | 5500 |
3 | 2020-21 | 4000 |
3 | 2018-21 | 14500 |
I would like to end up with:
company_code | financial_year | production |
---|---|---|
3 | 2018-19 | 5000 |
3 | 2019-20 | 5500 |
3 | 2020-21 | 4000 |
3 | 2018-21 | 14500 |
1 | 2018-19 | 2000 |
1 | 2019-20 | 2500 |
1 | 2020-21 | 3000 |
1 | 2018-21 | 7500 |
2 | 2018-19 | 1000 |
2 | 2019-20 | 1500 |
2 | 2020-21 | 1000 |
2 | 2020-21 | 3500 |
I tried:
dataset <- dataset %>% mutate(COMPANY_CODE = reorder(COMPANY_CODE, -production[financial_year=="2018/19"]))
But this does not work, could anyone help? Many thanks
CodePudding user response:
One option would be to use a helper "column" where you keep only the values for year 2018/19 and set all other values to 0 and finally use FUN=sum
in reorder
:
library(dplyr)
dataset %>%
mutate(company_code = reorder(company_code, -ifelse(financial_year == "2018-19", production, 0), FUN = sum)) |>
arrange(company_code)
#> company_code financial_year production
#> 1 3 2018-19 5000
#> 2 3 2019-20 5500
#> 3 3 2020-21 4000
#> 4 3 2018-21 14500
#> 5 1 2018-19 2000
#> 6 1 2019-20 2500
#> 7 1 2020-21 3000
#> 8 1 2018-21 7500
#> 9 2 2018-19 1000
#> 10 2 2019-20 1500
#> 11 2 2020-21 1000
#> 12 2 2020-21 3500
CodePudding user response:
One approach could be to sort the way you want and then use forcats::fct_inorder
, which is nice because your sorting conditions could be arbitrarily complicated.
library(dplyr); library(forcats)
df1 %>%
arrange(financial_year != "2018-19", -production) %>% # FALSE sorts before TRUE
mutate(company_code = fct_inorder(as.factor(company_code))) %>%
arrange(company_code, financial_year)
company_code financial_year production
1 3 2018-19 5000
2 3 2018-21 14500
3 3 2019-20 5500
4 3 2020-21 4000
5 1 2018-19 2000
6 1 2018-21 7500
7 1 2019-20 2500
8 1 2020-21 3000
9 2 2018-19 1000
10 2 2019-20 1500
11 2 2020-21 3500
12 2 2020-21 1000
CodePudding user response:
Another approach in base R would be:
df <- data.frame("group" = c(1,2,3,2,1,3), "value" = c(9, 2, 4, 7, 5, 1))
ag <- aggregate(value ~ group, df, sum)
df[order(match(df$group, ag$group)), ]
group value
1 1 9
5 1 5
2 2 2
4 2 7
3 3 4
6 3 1
CodePudding user response:
You could create a temporary column recording production
where financial_year
is '2018-19'
for each company, and arrange the data by this column.
library(dplyr)
df %>%
group_by(company_code) %>%
mutate(tmp = production[financial_year == '2018-19']) %>%
ungroup() %>%
arrange(desc(tmp)) %>%
select(-tmp)
# # A tibble: 12 × 3
# company_code financial_year production
# <dbl> <chr> <int>
# 1 3 2018-19 5000
# 2 3 2019-20 5500
# 3 3 2020-21 4000
# 4 3 2018-21 14500
# 5 1 2018-19 2000
# 6 1 2019-20 2500
# 7 1 2020-21 3000
# 8 1 2018-21 7500
# 9 2 2018-19 1000
# 10 2 2019-20 1500
# 11 2 2020-21 1000
# 12 2 2020-21 3500
Refer to @stefan's reorder()
solution, there is also a flexible variant fct_reorder2()
from forcats
to reorder a factor depending on other 2 vectors.
library(forcats)
df %>%
arrange(fct_reorder2(as.factor(company_code), production, financial_year,
.fun = \(x, y) x[y == '2018-19']))
CodePudding user response:
We could use separate
to separate 2018 and 20 to arrange later with these two.
After that using group_by twice with a helping column x to order for first production of each company.
library(dplyr)
library(tidyr)
df %>%
separate(fiancial_year, c("a", "b"), convert = TRUE, remove = FALSE) %>%
group_by(company_code) %>%
mutate(x = first(production)) %>%
ungroup() %>%
arrange(-x, b, -a) %>%
select(-c(a, b, x))
company_code fiancial_year production
<int> <chr> <int>
1 3 2018-19 5000
2 3 2019-20 5500
3 3 2020-21 4000
4 3 2018-21 14500
5 1 2018-19 2000
6 1 2019-20 2500
7 1 2020-21 3000
8 1 2018-21 7500
9 2 2018-19 1000
10 2 2019-20 1500
11 2 2020-21 1000
12 2 2020-21 3500