Home > Software engineering >  How to do ranking/factoring in R?
How to do ranking/factoring in R?

Time:07-31

In the following dataframe:

Year = c("2015", "2015", "2015", "2015", "2015",  "2015", "2015", "2016", "2016", "2016", "2016", "2016", "2016",  "2016", "2017", "2017", "2017", "2017", "2017", "2017", "2017",  "2018", "2018", "2018", "2018", "2018", "2018", "2019", "2019",  "2019", "2019", "2019", "2019")


Region = c("World", "Africa", "Middle East", "Europe", "America", "Asia", "Other", 
"World", "Africa", "Middle East", "Europe", "America", "Asia", "Other", "World", "Africa", "Middle East", "Europe", "America", "Asia", "Other", "World", "Africa", "Middle East", "Europe", "America", "Asia", "World", "Africa", "Middle East", "Europe", "America", "Asia")

Sales = c(914141282, 1944898318,  4364067461, 11300678829, 16278485843, 132612907539, 6014323190,  1423729376, 2594259389, 3953198670, 12218027505, 17074884991,  140098809735, 6476302180, 1824592711, 4001928375, 4487385252,  13386187005, 19482623313, 173213703884, 8678149745, 2515250712,  3025785028, 5264852268, 15280791380, 20221401985, 187302084150,  1970503832, 2900497180, 5776217486, 16459966376, 22412702838,  200321533591)

df <- data.frame(Year, Region, Sales)

   Year      Region        Sales
1  2015       World    914141282
2  2015      Africa   1944898318
3  2015 Middle East   4364067461
..............................
30 2019 Middle East   5776217486
31 2019      Europe  16459966376
32 2019     America  22412702838
33 2019        Asia 200321533591

I need to order regions according to "Sales" in the last year (2019 in this case), so I use the code below:

order <- subset(df, Year == max(Year), select = c("Region","Sales")) %>% arrange(-desc(Sales)) %>% select(Region)
df$Region <- factor(df$Region, levels = order$Region)

However, problem is that Regions "Other" is not given in 2019, but it is in 2015, 2016 and 2017. that's why in the output there is NA:

   Year      Region        Sales
1  2015       World    914141282
2  2015      Africa   1944898318
3  2015 Middle East   4364067461
4  2015      Europe  11300678829
5  2015     America  16278485843
6  2015        Asia 132612907539
7  2015        <NA>   6014323190 <---------------
8  2016       World   1423729376
9  2016      Africa   2594259389
10 2016 Middle East   3953198670
11 2016      Europe  12218027505
12 2016     America  17074884991
13 2016        Asia 140098809735
14 2016        <NA>   6476302180 <---------------
15 2017       World   1824592711
16 2017      Africa   4001928375
17 2017 Middle East   4487385252
18 2017      Europe  13386187005
19 2017     America  19482623313
20 2017        Asia 173213703884
21 2017        <NA>   8678149745 <---------------
22 2018       World   2515250712
23 2018      Africa   3025785028
24 2018 Middle East   5264852268
25 2018      Europe  15280791380
26 2018     America  20221401985
27 2018        Asia 187302084150
28 2019       World   1970503832
29 2019      Africa   2900497180
30 2019 Middle East   5776217486
31 2019      Europe  16459966376
32 2019     America  22412702838
33 2019        Asia 200321533591

So I want the the following procedure: If one of the region is not given in 2019, they have to be ranked as last in the ranking order during factoring

CodePudding user response:

For this we could use fct_inorder from forcats package.

library(dplyr)
library(forcats)

df %>% 
  mutate(Region = fct_reorder(Region, Sales))
   Year      Region        Sales
1  2015       World    914141282
2  2015      Africa   1944898318
3  2015 Middle East   4364067461
4  2015      Europe  11300678829
5  2015     America  16278485843
6  2015        Asia 132612907539
7  2015       Other   6014323190
8  2016       World   1423729376
9  2016      Africa   2594259389
10 2016 Middle East   3953198670
11 2016      Europe  12218027505
12 2016     America  17074884991
13 2016        Asia 140098809735
14 2016       Other   6476302180
15 2017       World   1824592711
16 2017      Africa   4001928375
17 2017 Middle East   4487385252
18 2017      Europe  13386187005
19 2017     America  19482623313
20 2017        Asia 173213703884
21 2017       Other   8678149745
22 2018       World   2515250712
23 2018      Africa   3025785028
24 2018 Middle East   5264852268
25 2018      Europe  15280791380
26 2018     America  20221401985
27 2018        Asia 187302084150
28 2019       World   1970503832
29 2019      Africa   2900497180
30 2019 Middle East   5776217486
31 2019      Europe  16459966376
32 2019     America  22412702838
33 2019        Asia 200321533591

CodePudding user response:

Using base R

lvls <- with(subset(df, Year == max(Year)), union(Region[order(-Sales)], df$Region))
df$Region <- factor(df$Region, levels = lvls)
levels(df$Region)
[1] "Asia"        "America"     "Europe"      "Middle East" "Africa"      "World"       "Other" 

Then do the order on the Region

> df[order(df$Year, df$Region),]
   Year      Region        Sales
6  2015        Asia 132612907539
5  2015     America  16278485843
4  2015      Europe  11300678829
3  2015 Middle East   4364067461
2  2015      Africa   1944898318
1  2015       World    914141282
7  2015       Other   6014323190
13 2016        Asia 140098809735
12 2016     America  17074884991
11 2016      Europe  12218027505
10 2016 Middle East   3953198670
9  2016      Africa   2594259389
8  2016       World   1423729376
14 2016       Other   6476302180
20 2017        Asia 173213703884
19 2017     America  19482623313
18 2017      Europe  13386187005
17 2017 Middle East   4487385252
16 2017      Africa   4001928375
15 2017       World   1824592711
21 2017       Other   8678149745
27 2018        Asia 187302084150
26 2018     America  20221401985
25 2018      Europe  15280791380
24 2018 Middle East   5264852268
23 2018      Africa   3025785028
22 2018       World   2515250712
33 2019        Asia 200321533591
32 2019     America  22412702838
31 2019      Europe  16459966376
30 2019 Middle East   5776217486
29 2019      Africa   2900497180
28 2019       World   1970503832
  • Related