Home > Software engineering >  How to find max and min values regardless of being positive or negative using R
How to find max and min values regardless of being positive or negative using R

Time:08-31

I have a data frame that look like:

Genes   intA    Chr_intA    Chr_intB    direction_1 direction_2 distance
GeneA   P53 chr19   chr8    -   -   -423
GeneA   P53 chr19   chr8    -   -   -3467567
GeneA   P53 chr19   chr8    -   -   10452
GeneB   P53 chr19   chr8    -   -   -2884
GeneB   P53 chr19   chr8    -   -   -40

I want to group by columns Genes and intA`` and then only get rows with the maximum or minimum values (regardless of being positive or negative) in the last column called distance```.

The desired output for getting maximum distance values will be:

Genes   intA    Chr_intA    Chr_intB    direction_1 direction_2 distance
GeneA   P53 chr19   chr8    -   -   -3467567
GeneB   P53 chr19   chr8    -   -   -2884

And the desired output for getting minimum distance values will be:

Genes   intA    Chr_intA    Chr_intB    direction_1 direction_2 distance
GeneA   P53 chr19   chr8    -   -   -423
GeneB   P53 chr19   chr8    -   -   -40

I tried the methods below but the problem is that it changes negative values to positive values as well as the shape of the final output. How can I solve these two minor things? Thanks.

library(dplyr)
df <- df %>% group_by(Genes, intA) %>% summarise(distance = max(abs(distance)))
df <- df %>% group_by(Genes, intA) %>% summarise(distance = min(abs(distance)))

CodePudding user response:

You can use slice_min and slice_max to get the highest or lowest n (default is 1) rows by group. Since you are looking at distance, you should use abs to get the absolute value of the distance.

dat %>% 
  group_by(Genes, intA) %>%
  slice_max(abs(distance))

#  Genes intA  Chr_intA Chr_intB direction_1 direction_2 distance
#  <chr> <chr> <chr>    <chr>    <chr>       <chr>          <int>
#1 GeneA P53   chr19    chr8     -           -           -3467567
#2 GeneB P53   chr19    chr8     -           -              -2884
  
dat %>% 
  group_by(Genes, intA) %>%
  slice_min(abs(distance))

#  Genes intA  Chr_intA Chr_intB direction_1 direction_2 distance
#  <chr> <chr> <chr>    <chr>    <chr>       <chr>          <int>
#1 GeneA P53   chr19    chr8     -           -               -423
#2 GeneB P53   chr19    chr8     -           -                -40
  • Related