Home > Net >  Split a dataframe by groups of values in R
Split a dataframe by groups of values in R

Time:10-02

I have a dataset similar to the following one

data1 <- data.frame(Symbol=c("APEX1","APOC3","CCNA2","CDC42","CDK1","BRCA2","BSCL2","BUB1B","EEF2","EFEMP1","EGF","ATP5O","ATR"), Total_read=c(32546,32426,31854,31745,25879,25465,24759,24574,8769,8458,2546,875,850))

I'm looking for a tidy approach to split this dataframe into subset (preferably in a list) by grouping values (within 10% variation from each other). So, the above dataset will be split into 5 subsets as below :

[1]
Symbol Total_read
APEX1      32546
APOC3      32426
CCNA2      31854
CDC42      31745

[2]
Symbol Total_read
CDK1       25879
BRCA2      25465
BSCL2      24759
BUB1B      24574

[3]
Symbol Total_read
EEF2       8769
EFEMP1     8458

[4]
Symbol Total_read
EGF        2546

[5]
Symbol Total_read
ATP5O      875
ATR        850

I appreciate for any suggestion.

CodePudding user response:

library(dplyr)

var10 <- function(x){
  n <- length(x)
  
  g <- 1
  
  out <- numeric(n)
  out[1] <- g
  
  for(i in 2:n){
    diff <- abs(100*(x[i-1] - x[i])/x[i-1])
    
    if(diff > 10){
      g <- g   1
    }
    out[i] <- g
    
  }
  
  return(out)
  
}


data1 %>% 
  mutate(aux = var10(Total_read)) %>% 
  group_split(aux)


[[1]]
# A tibble: 4 x 3
  Symbol Total_read   aux
  <chr>       <dbl> <dbl>
1 APEX1       32546     1
2 APOC3       32426     1
3 CCNA2       31854     1
4 CDC42       31745     1

[[2]]
# A tibble: 4 x 3
  Symbol Total_read   aux
  <chr>       <dbl> <dbl>
1 CDK1        25879     2
2 BRCA2       25465     2
3 BSCL2       24759     2
4 BUB1B       24574     2

[[3]]
# A tibble: 2 x 3
  Symbol Total_read   aux
  <chr>       <dbl> <dbl>
1 EEF2         8769     3
2 EFEMP1       8458     3

[[4]]
# A tibble: 1 x 3
  Symbol Total_read   aux
  <chr>       <dbl> <dbl>
1 EGF          2546     4

[[5]]
# A tibble: 2 x 3
  Symbol Total_read   aux
  <chr>       <dbl> <dbl>
1 ATP5O         875     5
2 ATR           850     5
  • Related