I'm trying to see the change of occupations in the industry over time, so I have the occupations IDs like an engineer, teacher, and lawyer, and several IDs that correspond to the sector like construction, education, mineral extraction, fishing... I would like to extract, from each occupation -and the largest and smallest change. A sample of the data follows. In this example, I would like to extract the top 3 largest positive changes, and the top 3 largest negative changes Could you guys help me?
x <- data.frame("occ_id" = c(1010, 1010, 1010, 1010, 1010, 1010, 1010,1234,1234,1234,1234, 4321, 4321,4321,4321,4321),
"Ind_id" = c(52418,52417,28339,27138,31224,33103,1112,27138,31224,1112,52418,33103,31224,1112,52417,26301),
"Change_occ_2000_2022" = c(1, -5 , 8 ,9 , - 11 ,15 ,16 ,-50,10,30,-5,20,10,50,30,-50))
then I tried this
x %>%
count(Change_occ_2000_2022) %>%
arrange(Change_occ_2000_2022) %>%
slice(c(head(row_number(), 3), tail(row_number(), 3)))
But doing this I could not capture the pair occ-ind that this change belongs. I would like to appear the pair to these changes like this:
x <- data.frame("occ_id" = c(4321, 4321, 1234, 1234, 4321, 1010),
"Ind_id" = c(1112,52417,1112,27138,26301, 31224 ),
"Change_occ_2000_2022" = c(50,30,30, -50, -50, -11))
CodePudding user response:
library(dplyr)
x %>%
group_by(occ_id) %>%
arrange(-Change_occ_2000_2022) %>%
ungroup() %>%
slice(c(head(row_number(), 3), tail(row_number(), 3)))
Output:
occ_id Ind_id Change_occ_2000_2022
<dbl> <dbl> <dbl>
1 4321 1112 50
2 1234 1112 30
3 4321 52417 30
4 1010 31224 -11
5 1234 27138 -50
6 4321 26301 -50
CodePudding user response:
library(dplyr)
x %>%
arrange(desc(Change_occ_2000_2022)) %>%
slice(c(1:3, (nrow(.) - 2):nrow(.)))
Output
occ_id Ind_id Change_occ_2000_2022
1 4321 1112 50
2 1234 1112 30
3 4321 52417 30
4 1010 31224 -11
5 1234 27138 -50
6 4321 26301 -50
CodePudding user response:
based on the solution LMc
df <- data.frame("occ_id" = c(1010, 1010, 1010, 1010, 1010, 1010, 1010,1234,1234,1234,1234, 4321, 4321,4321,4321,4321),
"Ind_id" = c(52418,52417,28339,27138,31224,33103,1112,27138,31224,1112,52418,33103,31224,1112,52417,26301),
"Change_occ_2000_2022" = c(1, -5 , 8 ,9 , - 11 ,15 ,16 ,-50,10,30,-5,20,10,50,30,-50))
library(data.table)
setDT(df)[order(Change_occ_2000_2022), .SD[c(1:3, (.N-2):.N)]]
#> occ_id Ind_id Change_occ_2000_2022
#> 1: 1234 27138 -50
#> 2: 4321 26301 -50
#> 3: 1010 31224 -11
#> 4: 1234 1112 30
#> 5: 4321 52417 30
#> 6: 4321 1112 50
Created on 2022-05-19 by the reprex package (v2.0.1)
or
setDT(df)[frankv(Change_occ_2000_2022, ties.method = "dense") <= 2 |
frankv(-Change_occ_2000_2022, ties.method = "dense") <= 2][order(Change_occ_2000_2022)]
if you need to take into account duplicate values
CodePudding user response:
x<-x %>%
arrange(Change_occ_2000_2022)
x<-rbind(head(x, 3), tail(x, 3))