How do I collect the minimum value of a column for each level of another column, while the new data frame is grouped by the other columns?
Here is a sample data set:
I want to collect the minimum time for each number, grouped by A2 and A3.
(In my original data frame, each column has more levels)
df <- structure(list(ID=c('a','a','a','a','b','b','b','b','c','c','c','c','d','d','d','d','e','e','e','e','f','f','f','f','g','g','g','g','h','h','h','h'),
A2=c('d1','d1','d1','d1','d1','d1','d1','d1','d2','d2','d2','d2','d2','d2','d2','d2','d1','d1','d1','d1','d1','d1','d1','d1','d2','d2','d2','d2','d2','d2','d2','d2'),
A3=c('g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g1','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2','g2'),
number=c('1','1','2','2','1','1','2','2','1','1','2','2','1','1','2','2','1','1','2','2','1','1','2','2','1','1','2','2','1','1','2','2'),
time=c(23,345,123,4,434,76,245,34,135,98,45,678,32,134,76,578,32,145,256,79,311,356,67,12,689,467,98,456,23,45,23,34)),
class = "data.frame", row.names = c(NA,-32L))
the result would look like the following:
df.result<-structure(list(ID=c('a','a','b','b','c','c','d','d','e','e','f','f','g','g','h','h'),
A2=c('d1','d1','d1','d1','d2','d2','d2','d2','d1','d1','d1','d1','d2','d2','d2','d2'),
A3=c('g1','g1','g1','g1','g1','g1','g1','g1','g2','g2','g2','g2','g2','g2','g2','g2'),
number=c('1','2','1','2','1','2','1','2','1','2','1','2','1','2','1','2'),
time=c(23,4,76,34,98,45,32,76,32,79,311,12,467,98,23,23)),
class = "data.frame", row.names = c(NA,-16L))
Thanks
CodePudding user response:
This seems like a job for aggregate
, update number
first:
transform(dat, number=with(rle(number), rep.int(seq_along(values), lengths))) |>
aggregate(time ~ number A2 A3, FUN=min)
# number A2 A3 time
# 1 1 d1 g1 234
# 2 2 d1 g1 12
# 3 3 d1 g1 232
# 4 4 d1 g1 44
# 5 5 d1 g1 21
# 6 6 d1 g1 34
# 7 13 d2 g1 345
# 8 14 d2 g1 34
# 9 15 d2 g1 56
# 10 16 d2 g1 98
# 11 7 d1 g2 23
# 12 8 d1 g2 12
# 13 9 d1 g2 689
# 14 10 d1 g2 4
# 15 11 d1 g2 43
# 16 12 d1 g2 21
# 17 17 d2 g2 245
# 18 18 d2 g2 134
# 19 19 d2 g2 567
# 20 20 d2 g2 1
CodePudding user response:
Try this:
library(data.table)
setDT(data)
data[, numberR := rleid(number)]
data[, min(time), by = .(A2, A3, numberR)]
This will match your exact expected output.
CodePudding user response:
The tidyverse solution would be:
df %>%
group_by(A2, A3, number) %>%
slice_min(time, n=1, with_ties = FALSE) %>%
ungroup()
CodePudding user response:
Thanks everyone for your answers:
this code workes (based on the code @Karsten W. provided)
df.result <- aggregate(df, time ~ ID number A2 A3, FUN=min)