Home > Net >  How to find the maximum x value for each column and report the corresponding y value?
How to find the maximum x value for each column and report the corresponding y value?

Time:07-28

Using R.Studio I have a table of raw data from a DNA size distribution plot for hundreds of samples. The RFU (y values) are arranged in columns for each sample with the same size (x values) in a separate column - see below.

Size distribution graph example for visualisation

Example data: (made up values just to show the format of the table)

sample001_rfu sample002_rfu sample003_rfu size_bp
5678 4567 3456 1000
8901 7890 6789 5000
10234 10123 10010 10000
12356 12345 11234 15000
15678 14567 13445 20000
13890 16589 15624 25000
10987 13425 17245 30000
8902 11323 15428 35000
6513 8919 12879 40000
4178 6528 10256 45000
3213 4380 8621 50000

I am trying to find the maximum y value (RFU) for all samples (i.e. max value in each column) and report the corresponding x value (size) which will be used for downstream automated sample processing planning.

So, in the table above:

  • sample001 = 20000bp (max rfu = 15678)
  • sample002 = 25000bp (max rfu = 16589)
  • sample003 = 30000bp (max rfu = 17245)

I have used the following to do this for one sample:

df$size_bp[which.max(df$sample001_rfu)] 

However, I cannot seem to find a solution to repeat this for each sample_rfu (column) in the table without manually replacing the sample id in the code above. I would then like to store these values and their sample IDs (column header) as a list which will later be compared against different processing thresholds.

Any suggestions would be greatly appreciated!

CodePudding user response:

base R

dat$size_bp[ sapply(dat[,-4], which.max) ]
# [1] 20000 25000 30000

## named
setNames(dat$size_bp[ sapply(dat[,-4], which.max) ], names(dat[,-4]))
# sample001_rfu sample002_rfu sample003_rfu 
#         20000         25000         30000 

dplyr

library(dplyr)
dat %>%
  summarize(across(-size_bp, ~ size_bp[which.max(.)]))
#   sample001_rfu sample002_rfu sample003_rfu
# 1         20000         25000         30000

data.table

library(data.table)
DT <- as.data.table(dat) # setDT is the preferred/canonical method
DT[, lapply(.SD, function(z) size_bp[which.max(z)]), .SDcols = patterns("^sample")]
#    sample001_rfu sample002_rfu sample003_rfu
#            <int>         <int>         <int>
# 1:         20000         25000         30000

Data

dat <- structure(list(sample001_rfu = c(5678L, 8901L, 10234L, 12356L, 15678L, 13890L, 10987L, 8902L, 6513L, 4178L, 3213L), sample002_rfu = c(4567L, 7890L, 10123L, 12345L, 14567L, 16589L, 13425L, 11323L, 8919L, 6528L, 4380L), sample003_rfu = c(3456L, 6789L, 10010L, 11234L, 13445L, 15624L, 17245L, 15428L, 12879L, 10256L, 8621L), size_bp = c(1000L, 5000L, 10000L, 15000L, 20000L, 25000L, 30000L, 35000L, 40000L, 45000L, 50000L)), class = "data.frame", row.names = c(NA, -11L))

CodePudding user response:

Here's another base R method:

samp_cols = names(df)[startsWith(names(df), "sample")]

result = lapply(samp_cols, function(x){
       mx = which.max(df[[x]])
       list(sample = x, max_rfu = df[mx, x], bp = df[mx, "size_bp"])
})

do.call(rbind, result)
#      sample          max_rfu bp   
# [1,] "sample001_rfu" 15678   20000
# [2,] "sample002_rfu" 16589   25000
# [3,] "sample003_rfu" 17245   30000

Using this data:

df = read.table(text = 'sample001_rfu   sample002_rfu   sample003_rfu   size_bp
5678    4567    3456    1000
8901    7890    6789    5000
10234   10123   10010   10000
12356   12345   11234   15000
15678   14567   13445   20000
13890   16589   15624   25000
10987   13425   17245   30000
8902    11323   15428   35000
6513    8919    12879   40000
4178    6528    10256   45000
3213    4380    8621    50000', header = T)

CodePudding user response:

Here is another tidyverse approach:

library(dplyr)
library(tidyr)

df %>% 
  pivot_longer(-size_bp) %>% 
  group_by(name) %>% 
  slice_max(n=1, value)
 size_bp name          value
    <int> <chr>         <int>
1   20000 sample001_rfu 15678
2   25000 sample002_rfu 16589
3   30000 sample003_rfu 17245
  • Related