In the following dataframe:
Year = c("2015", "2015", "2015", "2015", "2015", "2015", "2015", "2016", "2016", "2016", "2016", "2016", "2016", "2016", "2017", "2017", "2017", "2017", "2017", "2017", "2017", "2018", "2018", "2018", "2018", "2018", "2018", "2019", "2019", "2019", "2019", "2019", "2019")
Region = c("World", "Africa", "Middle East", "Europe", "America", "Asia", "Other",
"World", "Africa", "Middle East", "Europe", "America", "Asia", "Other", "World", "Africa", "Middle East", "Europe", "America", "Asia", "Other", "World", "Africa", "Middle East", "Europe", "America", "Asia", "World", "Africa", "Middle East", "Europe", "America", "Asia")
Sales = c(914141282, 1944898318, 4364067461, 11300678829, 16278485843, 132612907539, 6014323190, 1423729376, 2594259389, 3953198670, 12218027505, 17074884991, 140098809735, 6476302180, 1824592711, 4001928375, 4487385252, 13386187005, 19482623313, 173213703884, 8678149745, 2515250712, 3025785028, 5264852268, 15280791380, 20221401985, 187302084150, 1970503832, 2900497180, 5776217486, 16459966376, 22412702838, 200321533591)
df <- data.frame(Year, Region, Sales)
Year Region Sales
1 2015 World 914141282
2 2015 Africa 1944898318
3 2015 Middle East 4364067461
..............................
30 2019 Middle East 5776217486
31 2019 Europe 16459966376
32 2019 America 22412702838
33 2019 Asia 200321533591
I need to order regions according to "Sales" in the last year (2019 in this case), so I use the code below:
order <- subset(df, Year == max(Year), select = c("Region","Sales")) %>% arrange(-desc(Sales)) %>% select(Region)
df$Region <- factor(df$Region, levels = order$Region)
However, problem is that Regions "Other" is not given in 2019, but it is in 2015, 2016 and 2017. that's why in the output there is NA:
Year Region Sales
1 2015 World 914141282
2 2015 Africa 1944898318
3 2015 Middle East 4364067461
4 2015 Europe 11300678829
5 2015 America 16278485843
6 2015 Asia 132612907539
7 2015 <NA> 6014323190 <---------------
8 2016 World 1423729376
9 2016 Africa 2594259389
10 2016 Middle East 3953198670
11 2016 Europe 12218027505
12 2016 America 17074884991
13 2016 Asia 140098809735
14 2016 <NA> 6476302180 <---------------
15 2017 World 1824592711
16 2017 Africa 4001928375
17 2017 Middle East 4487385252
18 2017 Europe 13386187005
19 2017 America 19482623313
20 2017 Asia 173213703884
21 2017 <NA> 8678149745 <---------------
22 2018 World 2515250712
23 2018 Africa 3025785028
24 2018 Middle East 5264852268
25 2018 Europe 15280791380
26 2018 America 20221401985
27 2018 Asia 187302084150
28 2019 World 1970503832
29 2019 Africa 2900497180
30 2019 Middle East 5776217486
31 2019 Europe 16459966376
32 2019 America 22412702838
33 2019 Asia 200321533591
So I want the the following procedure: If one of the region is not given in 2019, they have to be ranked as last in the ranking order during factoring
CodePudding user response:
For this we could use fct_inorder
from forcats
package.
library(dplyr)
library(forcats)
df %>%
mutate(Region = fct_reorder(Region, Sales))
Year Region Sales
1 2015 World 914141282
2 2015 Africa 1944898318
3 2015 Middle East 4364067461
4 2015 Europe 11300678829
5 2015 America 16278485843
6 2015 Asia 132612907539
7 2015 Other 6014323190
8 2016 World 1423729376
9 2016 Africa 2594259389
10 2016 Middle East 3953198670
11 2016 Europe 12218027505
12 2016 America 17074884991
13 2016 Asia 140098809735
14 2016 Other 6476302180
15 2017 World 1824592711
16 2017 Africa 4001928375
17 2017 Middle East 4487385252
18 2017 Europe 13386187005
19 2017 America 19482623313
20 2017 Asia 173213703884
21 2017 Other 8678149745
22 2018 World 2515250712
23 2018 Africa 3025785028
24 2018 Middle East 5264852268
25 2018 Europe 15280791380
26 2018 America 20221401985
27 2018 Asia 187302084150
28 2019 World 1970503832
29 2019 Africa 2900497180
30 2019 Middle East 5776217486
31 2019 Europe 16459966376
32 2019 America 22412702838
33 2019 Asia 200321533591
CodePudding user response:
Using base R
lvls <- with(subset(df, Year == max(Year)), union(Region[order(-Sales)], df$Region))
df$Region <- factor(df$Region, levels = lvls)
levels(df$Region)
[1] "Asia" "America" "Europe" "Middle East" "Africa" "World" "Other"
Then do the order on the Region
> df[order(df$Year, df$Region),]
Year Region Sales
6 2015 Asia 132612907539
5 2015 America 16278485843
4 2015 Europe 11300678829
3 2015 Middle East 4364067461
2 2015 Africa 1944898318
1 2015 World 914141282
7 2015 Other 6014323190
13 2016 Asia 140098809735
12 2016 America 17074884991
11 2016 Europe 12218027505
10 2016 Middle East 3953198670
9 2016 Africa 2594259389
8 2016 World 1423729376
14 2016 Other 6476302180
20 2017 Asia 173213703884
19 2017 America 19482623313
18 2017 Europe 13386187005
17 2017 Middle East 4487385252
16 2017 Africa 4001928375
15 2017 World 1824592711
21 2017 Other 8678149745
27 2018 Asia 187302084150
26 2018 America 20221401985
25 2018 Europe 15280791380
24 2018 Middle East 5264852268
23 2018 Africa 3025785028
22 2018 World 2515250712
33 2019 Asia 200321533591
32 2019 America 22412702838
31 2019 Europe 16459966376
30 2019 Middle East 5776217486
29 2019 Africa 2900497180
28 2019 World 1970503832