I have a dataframe
as the one below. I want to combine the duplicates in the 'activity'
column except from the ones called 'selection'
, and sum their value in the 'duration'
column. I am doing it in R. I've tried using aggregate()
but I couldn't find the way to not aggregate the 'selection'
rows.
# df - I used dput so you can have my df
test <- structure(list(activity = c("selection", "selection", "selection",
"other", "inspection", "assignment", "inspection", "inspection",
"inspection", "inspection"), workers = c("worker 1", "worker 1",
"worker 1", "worker 34", "worker 6", "worker 5", "worker 2",
"worker 2", "worker 2", "worker 2"), start_time = structure(c(1645396200,
1645396200, 1645396200, 1645394352, 1645394155, 1645394100, 1645390080,
1645476480, 1645562880, 1645649280), class = c("POSIXct", "POSIXt"
), tzone = "UTC"), status = c("passed", "passed", "passed", "passed",
"passed", "passed", "passed", "passed", "passed", "passed"),
duration = c(8.98333333333333, 9.69027777777778, 9.20555555555556,
0.557222222222222, 2.24527777777778, 1.61666666666667, 2.12166666666667,
1.32638888888889, 2.59861111111111, 0.765555555555556)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
test
# A tibble: 10 x 5
activity workers start_time status duration
<chr> <chr> <dttm> <chr> <dbl>
1 selection worker 1 2022-02-20 22:30:00 passed 8.98
2 selection worker 1 2022-02-20 22:30:00 passed 9.69
3 selection worker 1 2022-02-20 22:30:00 passed 9.21
4 other worker 34 2022-02-20 21:59:12 passed 0.557
5 inspection worker 6 2022-02-20 21:55:55 passed 2.25
6 assignment worker 5 2022-02-20 21:55:00 passed 1.62
7 inspection worker 2 2022-02-20 20:48:00 passed 2.12
8 inspection worker 2 2022-02-21 20:48:00 passed 1.33
9 inspection worker 2 2022-02-22 20:48:00 passed 2.60
10 inspection worker 2 2022-02-23 20:48:00 passed 0.766
CodePudding user response:
Not sure to fully understand what you are looking for but I give it a try!
So, using the dplyr
library, you could do:
Reprex
- Code
library(dplyr)
test %>%
filter(activity != "selection") %>%
group_by(activity) %>%
summarise(workers = workers[1],
start_time = start_time[1],
status = status[1],
duration = sum(duration)) %>%
bind_rows(test %>% filter(activity == "selection"))
- Output
#> # A tibble: 6 x 5
#> activity workers start_time status duration
#> <chr> <chr> <dttm> <chr> <dbl>
#> 1 assignment worker 5 2022-02-20 21:55:00 passed 1.62
#> 2 inspection worker 6 2022-02-20 21:55:55 passed 9.06
#> 3 other worker 34 2022-02-20 21:59:12 passed 0.557
#> 4 selection worker 1 2022-02-20 22:30:00 passed 8.98
#> 5 selection worker 1 2022-02-20 22:30:00 passed 9.69
#> 6 selection worker 1 2022-02-20 22:30:00 passed 9.21
Created on 2022-02-25 by the reprex package (v2.0.1)