Home > OS >  Which group meet the criterion a < b < c depending on condition
Which group meet the criterion a < b < c depending on condition

Time:10-09

My title might not be very informative but this is an example which exposes my problem :

I have this dataframe :

df=data.frame(cond1=c(1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3),
              group=c("F","V","M","F","V","M","F","V","M","F","V","M","F","V","M","F","V","M"),
              gene=c("A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B"),
              value=c(1,2,3,4,5,6,7,8,9,1,3,2,4,3,2,2,3,4))
df
       cond1 group gene value
    1      1     F    A     1
    2      1     V    A     2
    3      1     M    A     3
    4      2     F    A     4
    5      2     V    A     5
    6      2     M    A     6
    7      3     F    A     7
    8      3     V    A     8
    9      3     M    A     9
    10     1     F    B     1
    11     1     V    B     3
    12     1     M    B     2
    13     2     F    B     4
    14     2     V    B     3
    15     2     M    B     2
    16     3     F    B     2
    17     3     V    B     3
    18     3     M    B     4

What I would like to obtain is for each gene, the sum of how many different cond1 have their value corresponding with F group smaller than their value corresponding with V their value corresponding with M.

In the 3 first lines, we are in gene A for the cond1. value correspoding to group F=1, V=2, M=3. So F<V<M for the A gene for the cond1=1 group.

My expected output for the gene A is 3 as all cond1 groups meet F<V<M for value. My expected output for the gene B is 1 as only cond1=3 group meet F<V<M for value.

My desired output would be ideally a dataframe with gene and the sum of cond1 than meet my criterion :

  gene count
1    A     3
2    B     1

I would be very grateful if you could provide me any tips on how should I proceed

CodePudding user response:

Check if all the data is in increasing order and count how many such values exist for each gene.

library(dplyr)

df %>%
  #If the data is not ordered, order it using arrange
  #arrange(gene, cond1, match(group, c('F', 'V', 'M'))) %>%
  group_by(gene, cond1) %>%
  summarise(cond = all(diff(value) > 0)) %>%
  summarise(count = sum(cond))

#  gene  count
#  <chr> <int>
#1 A         3
#2 B         1

CodePudding user response:

Using data.table

library(data.table)
setDT(df)[, .(cond = all(diff(value) > 0)), .(gene, cond1)][, .(count = sum(cond)), gene]
   gene count
1:    A     3
2:    B     1
  •  Tags:  
  • r
  • Related