I currently running a code generating a heatmap with a list of specific genes for different cell type. Each gene is classified in a specified category (A, B, C, etc). In my heatmap function (pheatmap package), I can put "breaks" with a vector of number specifying the row where the break has to be made.
However, I want that code to be flexible and use with modified gene list/table. So I would like to create a vector specifying the "position" where a change in factors is made. Here is a dummy example:
df <- data.frame("Gene ID" = rep(paste0("Gene",1:10),1),
"Category" = c("A", "B", "B", "D", "D", "D", "D", "E", "E", "H" ))
df
#which give
#Gene.ID Category
#1 Gene1 A
#2 Gene2 B
#3 Gene3 B
#4 Gene4 D
#5 Gene5 D
#6 Gene6 D
#7 Gene7 D
#8 Gene8 E
#9 Gene9 E
#10 Gene10 H
My idea was to order/arrange everything alphabetically (which is already done in my example) and extract the number of occurence through table() fonction:
table(factor(df$Category))
# Which give:
#A B D E H
#1 2 4 2 1
What I would like to do now
Is to create a vector that "sum" every number with the previous one, so I can have a vector indicating where the change of factor occurs. So the output would be:
# "1", "3", "7", "9", "10"
Indicating there that a break should occurs after row 1, row 3, row 7, row 9 and "row 10" (which is the end of the heatmap). How can I achieve that?
Also, in case, is there a better approach to do that?
Thanks in advance
CodePudding user response:
I think you need cumsum
:
cumsum(table(df$Category))
# A B D E H
# 1 3 7 9 10
This assumes that Category
is ordered perfectly, which results in the order of names (A
, B
, etc, above) being the same order as in the raw data.
CodePudding user response:
Another solution, maybe more flexible because it does not require values to be ordered in the data, is to use rle
:
cumsum(rle(df$Category)$lengths)
#[1] 1 3 7 9 10