Create a vector giving position for any change of levels-CodePudding

I currently running a code generating a heatmap with a list of specific genes for different cell type. Each gene is classified in a specified category (A, B, C, etc). In my heatmap function (pheatmap package), I can put "breaks" with a vector of number specifying the row where the break has to be made.

However, I want that code to be flexible and use with modified gene list/table. So I would like to create a vector specifying the "position" where a change in factors is made. Here is a dummy example:

df <- data.frame("Gene ID" = rep(paste0("Gene",1:10),1),
           "Category" = c("A", "B", "B", "D", "D", "D", "D", "E", "E", "H" ))
df

#which give
#Gene.ID Category
#1    Gene1        A
#2    Gene2        B
#3    Gene3        B
#4    Gene4        D
#5    Gene5        D
#6    Gene6        D
#7    Gene7        D
#8    Gene8        E
#9    Gene9        E
#10  Gene10        H

My idea was to order/arrange everything alphabetically (which is already done in my example) and extract the number of occurence through table() fonction:

table(factor(df$Category))
# Which give: 
#A B D E H 
#1 2 4 2 1

What I would like to do now

Is to create a vector that "sum" every number with the previous one, so I can have a vector indicating where the change of factor occurs. So the output would be:

# "1", "3", "7", "9", "10"

Indicating there that a break should occurs after row 1, row 3, row 7, row 9 and "row 10" (which is the end of the heatmap). How can I achieve that?

Also, in case, is there a better approach to do that?

Thanks in advance

CodePudding user response：

I think you need cumsum:

cumsum(table(df$Category))
#  A  B  D  E  H 
#  1  3  7  9 10

This assumes that Category is ordered perfectly, which results in the order of names (A, B, etc, above) being the same order as in the raw data.

CodePudding user response：

Another solution, maybe more flexible because it does not require values to be ordered in the data, is to use rle:

cumsum(rle(df$Category)$lengths)
#[1]  1  3  7  9 10