I have a data frame that look like:
Genes intA Chr_intA Chr_intB direction_1 direction_2 distance
GeneA P53 chr19 chr8 - - -423
GeneA P53 chr19 chr8 - - -3467567
GeneA P53 chr19 chr8 - - 10452
GeneB P53 chr19 chr8 - - -2884
GeneB P53 chr19 chr8 - - -40
I want to group by columns Genes
and intA`` and then only get rows with the maximum or minimum values (regardless of being positive or negative) in the last column called
distance```.
The desired output for getting maximum distance values will be:
Genes intA Chr_intA Chr_intB direction_1 direction_2 distance
GeneA P53 chr19 chr8 - - -3467567
GeneB P53 chr19 chr8 - - -2884
And the desired output for getting minimum distance values will be:
Genes intA Chr_intA Chr_intB direction_1 direction_2 distance
GeneA P53 chr19 chr8 - - -423
GeneB P53 chr19 chr8 - - -40
I tried the methods below but the problem is that it changes negative values to positive values as well as the shape of the final output. How can I solve these two minor things? Thanks.
library(dplyr)
df <- df %>% group_by(Genes, intA) %>% summarise(distance = max(abs(distance)))
df <- df %>% group_by(Genes, intA) %>% summarise(distance = min(abs(distance)))
CodePudding user response:
You can use slice_min
and slice_max
to get the highest or lowest n
(default is 1) rows by group. Since you are looking at distance, you should use abs
to get the absolute value of the distance.
dat %>%
group_by(Genes, intA) %>%
slice_max(abs(distance))
# Genes intA Chr_intA Chr_intB direction_1 direction_2 distance
# <chr> <chr> <chr> <chr> <chr> <chr> <int>
#1 GeneA P53 chr19 chr8 - - -3467567
#2 GeneB P53 chr19 chr8 - - -2884
dat %>%
group_by(Genes, intA) %>%
slice_min(abs(distance))
# Genes intA Chr_intA Chr_intB direction_1 direction_2 distance
# <chr> <chr> <chr> <chr> <chr> <chr> <int>
#1 GeneA P53 chr19 chr8 - - -423
#2 GeneB P53 chr19 chr8 - - -40