Reorder variable according to conditions-CodePudding

I would like to order my dataset according to the production levels in 2018-19. So company_code 3 goes first (5000), then company_code 1 (2000) and then company_code 2 (1000).

I have a dataset like this:

company_code	financial_year	production
1	2018-19	2000
1	2019-20	2500
1	2020-21	3000
1	2018-21	7500
2	2018-19	1000
2	2019-20	1500
2	2020-21	1000
2	2020-21	3500
3	2018-19	5000
3	2019-20	5500
3	2020-21	4000
3	2018-21	14500

I would like to end up with:

company_code	financial_year	production
3	2018-19	5000
3	2019-20	5500
3	2020-21	4000
3	2018-21	14500
1	2018-19	2000
1	2019-20	2500
1	2020-21	3000
1	2018-21	7500
2	2018-19	1000
2	2019-20	1500
2	2020-21	1000
2	2020-21	3500

I tried:

dataset <- dataset %>% mutate(COMPANY_CODE = reorder(COMPANY_CODE, -production[financial_year=="2018/19"]))

But this does not work, could anyone help? Many thanks

CodePudding user response：

One option would be to use a helper "column" where you keep only the values for year 2018/19 and set all other values to 0 and finally use FUN=sum in reorder:

library(dplyr)

dataset %>% 
  mutate(company_code = reorder(company_code, -ifelse(financial_year == "2018-19", production, 0), FUN = sum)) |> 
  arrange(company_code)
#>    company_code financial_year production
#> 1             3        2018-19       5000
#> 2             3        2019-20       5500
#> 3             3        2020-21       4000
#> 4             3        2018-21      14500
#> 5             1        2018-19       2000
#> 6             1        2019-20       2500
#> 7             1        2020-21       3000
#> 8             1        2018-21       7500
#> 9             2        2018-19       1000
#> 10            2        2019-20       1500
#> 11            2        2020-21       1000
#> 12            2        2020-21       3500

CodePudding user response：

One approach could be to sort the way you want and then use forcats::fct_inorder, which is nice because your sorting conditions could be arbitrarily complicated.

library(dplyr); library(forcats)
df1 %>% 
  arrange(financial_year != "2018-19", -production) %>% # FALSE sorts before TRUE
  mutate(company_code = fct_inorder(as.factor(company_code))) %>%
  arrange(company_code, financial_year)


   company_code financial_year production
1             3        2018-19       5000
2             3        2018-21      14500
3             3        2019-20       5500
4             3        2020-21       4000
5             1        2018-19       2000
6             1        2018-21       7500
7             1        2019-20       2500
8             1        2020-21       3000
9             2        2018-19       1000
10            2        2019-20       1500
11            2        2020-21       3500
12            2        2020-21       1000

CodePudding user response：

Another approach in base R would be:

df <- data.frame("group" = c(1,2,3,2,1,3), "value" = c(9, 2, 4, 7, 5, 1))
ag <- aggregate(value ~ group, df, sum)
df[order(match(df$group, ag$group)), ]

  group value
1     1     9
5     1     5
2     2     2
4     2     7
3     3     4
6     3     1

CodePudding user response：

You could create a temporary column recording production where financial_year is '2018-19' for each company, and arrange the data by this column.

library(dplyr)

df %>%
  group_by(company_code) %>%
  mutate(tmp = production[financial_year == '2018-19']) %>%
  ungroup() %>%
  arrange(desc(tmp)) %>%
  select(-tmp)

# # A tibble: 12 × 3
#    company_code financial_year production
#           <dbl> <chr>               <int>
#  1            3 2018-19              5000
#  2            3 2019-20              5500
#  3            3 2020-21              4000
#  4            3 2018-21             14500
#  5            1 2018-19              2000
#  6            1 2019-20              2500
#  7            1 2020-21              3000
#  8            1 2018-21              7500
#  9            2 2018-19              1000
# 10            2 2019-20              1500
# 11            2 2020-21              1000
# 12            2 2020-21              3500

Refer to @stefan's reorder() solution, there is also a flexible variant fct_reorder2() from forcats to reorder a factor depending on other 2 vectors.

library(forcats)

df %>%
  arrange(fct_reorder2(as.factor(company_code), production, financial_year,
                       .fun = \(x, y) x[y == '2018-19']))

CodePudding user response：

We could use separate to separate 2018 and 20 to arrange later with these two. After that using group_by twice with a helping column x to order for first production of each company.

library(dplyr)
library(tidyr)

df %>% 
  separate(fiancial_year, c("a", "b"), convert = TRUE, remove = FALSE) %>% 
  group_by(company_code) %>% 
  mutate(x = first(production)) %>% 
  ungroup() %>%
  arrange(-x, b, -a) %>% 
  select(-c(a, b, x))

    company_code fiancial_year production
          <int> <chr>              <int>
 1            3 2018-19             5000
 2            3 2019-20             5500
 3            3 2020-21             4000
 4            3 2018-21            14500
 5            1 2018-19             2000
 6            1 2019-20             2500
 7            1 2020-21             3000
 8            1 2018-21             7500
 9            2 2018-19             1000
10            2 2019-20             1500
11            2 2020-21             1000
12            2 2020-21             3500