Home > Blockchain >  How to write a function in R which will identify peaks within a dataframe
How to write a function in R which will identify peaks within a dataframe

Time:07-26

I have a set of biological count data within a data frame in R which has 200,000 entries. I am looking to write a function that will identify the peaks within the count data. By peaks, I want the top 50 count data. I am expecting there to be multiple peaks within this dataset as the median value is 0. When inputting:

> summary(df$V3)

My output looks like this:

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
    0.00     0.00     0.00     1.82     1.00 94746.00 

I am wanting to write a function that will list the peaks and then look at the numbers on either side of the peaks ( 1 and -1) to produce a ratio. Can anyone help with this?

My dataframe looks like this and is labelled df:

V1    V2    V3   
gene  1     6
gene  2     0
gene  3     0
gene  4     10
....

My expected output would be a data frame identifying the peaks, and at what position (V2) within this dataset so I can examine the numbers on either side of the peaks to produce a ratio for analysis.

CodePudding user response:

This is a crude way of doing this, this will give you values on either side of the peak, where you can make a ratio.

Here I considered the peaks as any value higher than the mean.

library(tidyverse)

"V1    V2    V3
gene  1     6
gene  2     0
gene  3     0
gene  4     10
gene  5     1" %>% 
  read_table() -> df

mean <- 1.82

df %>% 
  filter(V3 > mean) %>% 
  pull(V2) -> ids


df %>% 
  mutate(minus_peaks = lead(V3),
         plus_peaks = lag(V3)) %>% 
  filter(V2 %in% ids)
# A tibble: 2 × 5
  V1       V2    V3 minus_peaks plus_peaks
  <chr> <dbl> <dbl>       <dbl>      <dbl>
1 gene      1     6           0         NA
2 gene      4    10           1          0

CodePudding user response:

This should give you the position (V2) of the peaks (vec) and using the map function from the purrr package you can get the relevant ratio, i.e.

library(purrr)
library(dplyr)
# sample data
df <- data.frame(V1 = "gene", 
               V2 = 1:10000,
               V3 = rbeta(10000,1,5))
    
vec <- df %>% 
  arrange(-V3) %>% 
  slice(1:50) %>% 
  pull(V2)



map_dfr(vec, ~df %>% 
      filter(V2 == .x-1 | V2 == .x 1 ) %>% 
      mutate(ratio = V3[1]/V3[2])
      )
  • Related