How to create an order/ranking for majors declared by each student-CodePudding

I have a dataset that has student ids and name of major along with semester. The main complicating factor, is that if a student returns to a previous major, I want it to count as another major

StudID   Major       Term
12345    History     202101
12345    History     202102
12345    Management  202201
12345    History     202202

Desired result

StudID   Major       Term    MajorNumber
12345    History     202101  1
12345    History     202102  1
12345    Management  202201  2
12345    History     202202  3

The only thing I have tried is pulling in the earliest term for each major then doing a ranking within StudentID based on each majors earliest term, but that means that final "history" becomes a 1 instead of 3

CodePudding user response：

The data.table::rleid function does this. Specifically, it will give all consecutive values that are the same the same ID, and will increment whenever there is a change.

With data.table,

library(data.table)
your_dt = as.data.table(your_data)
your_dt[, MajorNumber := rleid(Major), by = .(StudID)]
your_dt
#    StudID      Major   Term MajorNumber
# 1:  12345    History 202101           1
# 2:  12345    History 202102           1
# 3:  12345 Management 202201           2
# 4:  12345    History 202202           3

Or with dplyr:

library(dplyr)
your_data %>%
  group_by(StudID) %>%
  mutate(MajorNumber = data.table::rleid(Major)) %>%
  ungroup()

CodePudding user response：

In base R, use ave rle

dat$MajorNumber <-  with(dat, as.integer(ave(Major, StudID, 
  FUN = \(x) with(rle(x), rep(seq_along(values), lengths)))))
dat$MajorNumber
[1] 1 1 2 3

CodePudding user response：

Here is a dplyr solution:

library(dplyr)

df %>% 
  mutate(x = cumsum(Major != lag(Major, default = Major[1])) 1)

  StudID      Major   Term x
1  12345    History 202101 1
2  12345    History 202102 1
3  12345 Management 202201 2
4  12345    History 202202 3

CodePudding user response：

It may not be the best of the approach, but i tried the below code

data

data <- tribble(
~StudID,   ~Major,  ~Term,
12345,    'History',     202101,
12345,    'History' ,    202102,
12345,    'Management',  202201,
12345,    'History',     202202
)

code

data2 <- data %>% 
mutate(term2=substr(as.character(Term),1,4)) %>% 
group_by(StudID,Major,term2) %>%  slice(1) %>% 
ungroup() %>% 
arrange(StudID,Term) %>% mutate(row=row_number())

data3 <- data %>% full_join(data2 %>% select(-term2), by=c('StudID','Term','Major')) %>% 
fill(row)

data

code

output