I have a dataset similar to the following one
data1 <- data.frame(Symbol=c("APEX1","APOC3","CCNA2","CDC42","CDK1","BRCA2","BSCL2","BUB1B","EEF2","EFEMP1","EGF","ATP5O","ATR"), Total_read=c(32546,32426,31854,31745,25879,25465,24759,24574,8769,8458,2546,875,850))
I'm looking for a tidy approach to split this dataframe into subset (preferably in a list) by grouping values (within 10% variation from each other). So, the above dataset will be split into 5 subsets as below :
[1]
Symbol Total_read
APEX1 32546
APOC3 32426
CCNA2 31854
CDC42 31745
[2]
Symbol Total_read
CDK1 25879
BRCA2 25465
BSCL2 24759
BUB1B 24574
[3]
Symbol Total_read
EEF2 8769
EFEMP1 8458
[4]
Symbol Total_read
EGF 2546
[5]
Symbol Total_read
ATP5O 875
ATR 850
I appreciate for any suggestion.
CodePudding user response:
library(dplyr)
var10 <- function(x){
n <- length(x)
g <- 1
out <- numeric(n)
out[1] <- g
for(i in 2:n){
diff <- abs(100*(x[i-1] - x[i])/x[i-1])
if(diff > 10){
g <- g 1
}
out[i] <- g
}
return(out)
}
data1 %>%
mutate(aux = var10(Total_read)) %>%
group_split(aux)
[[1]]
# A tibble: 4 x 3
Symbol Total_read aux
<chr> <dbl> <dbl>
1 APEX1 32546 1
2 APOC3 32426 1
3 CCNA2 31854 1
4 CDC42 31745 1
[[2]]
# A tibble: 4 x 3
Symbol Total_read aux
<chr> <dbl> <dbl>
1 CDK1 25879 2
2 BRCA2 25465 2
3 BSCL2 24759 2
4 BUB1B 24574 2
[[3]]
# A tibble: 2 x 3
Symbol Total_read aux
<chr> <dbl> <dbl>
1 EEF2 8769 3
2 EFEMP1 8458 3
[[4]]
# A tibble: 1 x 3
Symbol Total_read aux
<chr> <dbl> <dbl>
1 EGF 2546 4
[[5]]
# A tibble: 2 x 3
Symbol Total_read aux
<chr> <dbl> <dbl>
1 ATP5O 875 5
2 ATR 850 5