I have a dataframe looks like below:
person year Office Job rank
Harry 2002 Los Angeles CEO -1
Harry 2006 Boston CEO -1
Harry 2006 Los Angeles Advisor 0
Harry 2006 Chicago Chairman 2
Peter 2001 New York Director -1
Peter 2001 Chicago CFO 3
Peter 2001 Chicago CEO 1
I want to sort the data within the person-year groups. The sorting should be based on the information in the column "rank". For each person-year group with multiple (>1) records, I want to sort the data based on the numeric values in the column "rank" (from small to large) and create a new column called "rank2" to capture the sorted order after sorting (starting from 0).
Thus, the output should look like:
person year Office Job rank rank2
Harry 2002 Los Angeles CEO -1 0
Harry 2006 Boston CEO -1 0
Harry 2006 Los Angeles Advisor 0 1
Harry 2006 Chicago Chairman 2 2
Peter 2001 New York Director -1 0
Peter 2001 Chicago CFO 3 2
Peter 2001 Chicago CEO 1 1
CodePudding user response:
df %>%
arrange(person,year,rank) %>%
group_by(person,year) %>%
mutate(rank2 = row_number()-1)
data.table option
df[order(person,year,rank),rank2:=0:(.N-1), by=.(person,year)]
CodePudding user response:
Using dense_rank
-
library(dplyr)
df %>%
group_by(person, year) %>%
mutate(rank2 = dense_rank(rank) - 1) %>%
ungroup
# person year Office Job rank rank2
# <chr> <int> <chr> <chr> <int> <dbl>
#1 Harry 2002 LosAngeles CEO -1 0
#2 Harry 2006 Boston CEO -1 0
#3 Harry 2006 LosAngeles Advisor 0 1
#4 Harry 2006 Chicago Chairman 2 2
#5 Peter 2001 NewYork Director -1 0
#6 Peter 2001 Chicago CFO 3 2
#7 Peter 2001 Chicago CEO 1 1
Similarly, in base R we can use rank
-
transform(df, rank2 = ave(rank, person, year, FUN = base::rank) - 1)