Find the sum of a column values based on unique cases,max value and specific value of other columns-CodePudding

I have the dataframe below and I want to find the sum qty of unique case_id for the max lifecycle that it has when the in_out value is out. That means that I will add for this case 17694 and 28290

lifes<-structure(list(case_id = c(3396922, 3396922, 3396922, 3396922, 
3396922, 3396922, 3396922, 3396922, 3397364, 3397364, 3397364, 
3397364, 3397364, 3397364, 3397364, 3397364, 3397364, 3397364, 
3397364, 3397364), lifecycle = c(1, 1, 2, 2, 3, 3, 4, 4, 1, 1, 
2, 2, 3, 3, 4, 4, 5, 5, 6, 6), qty = c(0, 19977, 19977, 2386, 
17591, 17481, 17480, 17694, 0, 31276, 31276, 9, 31267, 31538, 
6644, 6120, 3740, 3480, 28290, 28290), in_out = c("in", "out", 
"in", "out", "in", "out", "in", "out", "in", "out", "in", "out", 
"in", "out", "in", "out", "in", "out", "in", "out")), row.names = c(NA, 
-20L), case_id = "case_id", activity_id = "activity", activity_instance_id = "action", lifecycle_id = "registration_type", resource_id = "resource", timestamp = "timestamp", class = c("eventlog", 
"log", "tbl_df", "tbl", "data.frame"))

CodePudding user response：

Here's one way -

library(dplyr)

lifes %>%
  filter(in_out == 'out') %>%
  group_by(case_id) %>%
  summarise(max_value = qty[which.max(lifecycle)]) %>%
  pull(max_value) %>% sum()

#[1] 45984

If you run only till summarise step you can see the max_value for each case_id -

lifes %>%
  filter(in_out == 'out') %>%
  group_by(case_id) %>%
  summarise(max_value = qty[which.max(lifecycle)])

#  case_id max_value
#    <dbl>     <dbl>
#1 3396922     17694
#2 3397364     28290

CodePudding user response：

does this work for you?

library(dplyr)

    lifes |>
      group_by(case_id, in_out) |> 
      mutate(lifecycle_is_max = lifecycle == max(lifecycle)) |> 
      group_by(in_out, lifecycle_is_max) |> 
      summarise(x = sum(qty))

CodePudding user response：

This works using dplyr package :

library(dplyr)

lifes %>%
    filter(in_out == "out") %>%
    arrange(-lifecycle) %>%
    filter(duplicated(case_id) == F) %>%
    sum()