Home > other >  Create a vector giving position for any change of levels
Create a vector giving position for any change of levels

Time:11-08

I currently running a code generating a heatmap with a list of specific genes for different cell type. Each gene is classified in a specified category (A, B, C, etc). In my heatmap function (pheatmap package), I can put "breaks" with a vector of number specifying the row where the break has to be made.

However, I want that code to be flexible and use with modified gene list/table. So I would like to create a vector specifying the "position" where a change in factors is made. Here is a dummy example:

df <- data.frame("Gene ID" = rep(paste0("Gene",1:10),1),
           "Category" = c("A", "B", "B", "D", "D", "D", "D", "E", "E", "H" ))
df

#which give
#Gene.ID Category
#1    Gene1        A
#2    Gene2        B
#3    Gene3        B
#4    Gene4        D
#5    Gene5        D
#6    Gene6        D
#7    Gene7        D
#8    Gene8        E
#9    Gene9        E
#10  Gene10        H


My idea was to order/arrange everything alphabetically (which is already done in my example) and extract the number of occurence through table() fonction:

table(factor(df$Category))
# Which give: 
#A B D E H 
#1 2 4 2 1 

What I would like to do now

Is to create a vector that "sum" every number with the previous one, so I can have a vector indicating where the change of factor occurs. So the output would be:

# "1", "3", "7", "9", "10"

Indicating there that a break should occurs after row 1, row 3, row 7, row 9 and "row 10" (which is the end of the heatmap). How can I achieve that?

Also, in case, is there a better approach to do that?

Thanks in advance

CodePudding user response:

I think you need cumsum:

cumsum(table(df$Category))
#  A  B  D  E  H 
#  1  3  7  9 10 

This assumes that Category is ordered perfectly, which results in the order of names (A, B, etc, above) being the same order as in the raw data.

CodePudding user response:

Another solution, maybe more flexible because it does not require values to be ordered in the data, is to use rle:

cumsum(rle(df$Category)$lengths)
#[1]  1  3  7  9 10
  • Related