Home > OS >  Collecting the lowest value of a column based on other columns in R
Collecting the lowest value of a column based on other columns in R

Time:09-18

How do I collect the minimum value of a column for each level of another column, while the new data frame is grouped by the other columns?

Here is a sample data set:

I want to collect the minimum time for each number, grouped by A2 and A3.

(In my original data frame, each column has more levels)

df <- structure(list(ID=c('a','a','a','a','b','b','b','b','c','c','c','c','d','d','d','d','e','e','e','e','f','f','f','f','g','g','g','g','h','h','h','h'),
                     A2=c('d1','d1','d1','d1','d1','d1','d1','d1','d2','d2','d2','d2','d2','d2','d2','d2','d1','d1','d1','d1','d1','d1','d1','d1','d2','d2','d2','d2','d2','d2','d2','d2'),
                     A3=c('g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2'),
                     number=c('1','1','2','2','1','1','2','2','1','1','2','2','1','1','2','2','1','1','2','2','1','1','2','2','1','1','2','2','1','1','2','2'),
                     time=c(23,345,123,4,434,76,245,34,135,98,45,678,32,134,76,578,32,145,256,79,311,356,67,12,689,467,98,456,23,45,23,34)), 
                class = "data.frame", row.names = c(NA,-32L))

the result would look like the following:

df.result<-structure(list(ID=c('a','a','b','b','c','c','d','d','e','e','f','f','g','g','h','h'),
                          A2=c('d1','d1','d1','d1','d2','d2','d2','d2','d1','d1','d1','d1','d2','d2','d2','d2'),
                          A3=c('g1','g1','g1','g1','g1','g1','g1','g1','g2','g2','g2','g2','g2','g2','g2','g2'),
                          number=c('1','2','1','2','1','2','1','2','1','2','1','2','1','2','1','2'),
                          time=c(23,4,76,34,98,45,32,76,32,79,311,12,467,98,23,23)), 
                     class = "data.frame", row.names = c(NA,-16L))

Thanks

CodePudding user response:

This seems like a job for aggregate, update number first:

transform(dat, number=with(rle(number), rep.int(seq_along(values), lengths))) |>
  aggregate(time ~ number   A2   A3, FUN=min)
#    number A2 A3 time
# 1       1 d1 g1  234
# 2       2 d1 g1   12
# 3       3 d1 g1  232
# 4       4 d1 g1   44
# 5       5 d1 g1   21
# 6       6 d1 g1   34
# 7      13 d2 g1  345
# 8      14 d2 g1   34
# 9      15 d2 g1   56
# 10     16 d2 g1   98
# 11      7 d1 g2   23
# 12      8 d1 g2   12
# 13      9 d1 g2  689
# 14     10 d1 g2    4
# 15     11 d1 g2   43
# 16     12 d1 g2   21
# 17     17 d2 g2  245
# 18     18 d2 g2  134
# 19     19 d2 g2  567
# 20     20 d2 g2    1

CodePudding user response:

Try this:

library(data.table)

setDT(data)
data[, numberR := rleid(number)]
data[, min(time), by = .(A2, A3, numberR)]

This will match your exact expected output.

CodePudding user response:

The tidyverse solution would be:

df %>%
 group_by(A2, A3, number) %>%
 slice_min(time, n=1, with_ties = FALSE) %>%
 ungroup()

CodePudding user response:

Thanks everyone for your answers:

this code workes (based on the code @Karsten W. provided)

df.result <- aggregate(df, time ~ ID   number   A2   A3, FUN=min)
  • Related