How to write a function in R which will identify peaks within a dataframe-CodePudding

I have a set of biological count data within a data frame in R which has 200,000 entries. I am looking to write a function that will identify the peaks within the count data. By peaks, I want the top 50 count data. I am expecting there to be multiple peaks within this dataset as the median value is 0. When inputting:

> summary(df$V3)

My output looks like this:

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
    0.00     0.00     0.00     1.82     1.00 94746.00

I am wanting to write a function that will list the peaks and then look at the numbers on either side of the peaks ( 1 and -1) to produce a ratio. Can anyone help with this?

My dataframe looks like this and is labelled df:

V1    V2    V3   
gene  1     6
gene  2     0
gene  3     0
gene  4     10
....

My expected output would be a data frame identifying the peaks, and at what position (V2) within this dataset so I can examine the numbers on either side of the peaks to produce a ratio for analysis.

CodePudding user response：

This is a crude way of doing this, this will give you values on either side of the peak, where you can make a ratio.

Here I considered the peaks as any value higher than the mean.

library(tidyverse)

"V1    V2    V3
gene  1     6
gene  2     0
gene  3     0
gene  4     10
gene  5     1" %>% 
  read_table() -> df

mean <- 1.82

df %>% 
  filter(V3 > mean) %>% 
  pull(V2) -> ids


df %>% 
  mutate(minus_peaks = lead(V3),
         plus_peaks = lag(V3)) %>% 
  filter(V2 %in% ids)

# A tibble: 2 × 5
  V1       V2    V3 minus_peaks plus_peaks
  <chr> <dbl> <dbl>       <dbl>      <dbl>
1 gene      1     6           0         NA
2 gene      4    10           1          0

CodePudding user response：

This should give you the position (V2) of the peaks (vec) and using the map function from the purrr package you can get the relevant ratio, i.e.

library(purrr)
library(dplyr)
# sample data
df <- data.frame(V1 = "gene", 
               V2 = 1:10000,
               V3 = rbeta(10000,1,5))
    
vec <- df %>% 
  arrange(-V3) %>% 
  slice(1:50) %>% 
  pull(V2)



map_dfr(vec, ~df %>% 
      filter(V2 == .x-1 | V2 == .x 1 ) %>% 
      mutate(ratio = V3[1]/V3[2])
      )