Home > Software engineering >  Count changes for each ID
Count changes for each ID

Time:05-18

I have a set of data of 2000 people with repeated measurements over multiple years between 2000-2022 (some people have data for the full time period whereas others only for a subset of these years). Within a single year, each person can only fall into one of four groups: 0, 1, 2, or 3. I want to quantify how many times each person changes grouping within their individual sampling period. The data is laid out like this (image attached):

enter image description here

So for this example dataset, the code would tell me that person 1 has had 3 changes (from group 1 to group 2 in 2011, then from group 2 back to group 1 and lastly back to group 2 in 2015) whereas for group 1 the code should tell me there have been 0 changes.

I don't need the loop to tell me which years have the changes for now, only the total amount of changes in grouping per person. I was thinking about a loop that goes something like "if X does not equal previous number then add 1 to a count" but I've never done something like this before and not quite sure how to lay this code out.

Would appreciate any help :)

Edit: This is the R syntax version of the data as suggested by @GregorThomas

structure(list(ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 
2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4), Year = c(2010, 
2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 
2001, 2002, 2003, 2004, 2001, 2002, 2003, 2004, 2005, 2007, 2009, 
2010, 2011, 2012, 2013, 2014, 2015, 2001, 2002, 2003, 2004, 2005, 
2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 
2017, 2018, 2019, 2020, 2021), Culture_Grouping = c(1, 1, 1, 
1, 1, 1, 3, 3, 3, 1, 3, 3, 0, 1, 3, 3, 3, 3, 3, 3, 3, 0, 0, 0, 
0, 0, 2, 0, 0, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1)), row.names = c(NA, -50L), class = c("tbl_df", 
"tbl", "data.frame"))

CodePudding user response:

You can achieve it without any loop:

aggregate(Group ~ ID, df, \(x) sum(diff(x) != 0))

#   ID Group
# 1  1     3
# 2  2     0

Its dplyr equivalent is:

library(dplyr)

df %>%
  group_by(ID) %>%
  summarise(Group = sum(diff(Group) != 0)) %>%
  ungroup()

Or more simplified with count():

df %>%
  count(ID, wt = diff(Group) != 0)

Data
df <- data.frame(ID = c(rep(1, 6), rep(2, 2)),
                 Year = c(2010:2015, 2012:2013),
                 Group = c(1,2,1,1,1,2,1,1))

CodePudding user response:

data.table option:

library(data.table)
setDT(df)[, uniqueN(rleid(Culture_Grouping)) - 1, ID]

Output:

   ID V1
1:  1  3
2:  2  2
3:  3  3
4:  4  2
  • Related