Home > OS >  Get maximum value from dataset based aggregate number from another column
Get maximum value from dataset based aggregate number from another column

Time:07-21

I have the dataset as follows, which group by ID:

ID, Activity, Duration
1, Reading,     20
1, Work,        40
1, Reading,     30
2, Home,        50
2, Writing,     30
2, Reading,     20
2, Writing,     30

And I want to get another column that tells us the activity with the highest duration, so person 1 should be reading because it takes 50 minutes, and person 2 should be writing because it takes 60 minutes. Below is an example of the desired output.

ID, Activity, Duration, Max_Actitvity
    1, Reading, 20, Reading
    1, Work,    40, Reading
    1, Reading, 30, Reading
    2, Home,    50, Writing
    2, Writing, 30, Writing
    2, Reading, 20, Writing
    2, Writing, 30, Writing

CodePudding user response:

You can use the following code:

df <- read.table(text = "ID, Activity, Duration
1, Reading, 20
1, Work, 40
1, Reading, 30
2, Home, 50
2, Writing, 30
2, Reading, 20
2, Writing, 30", header = TRUE, sep = ",")

library(dplyr)
df %>%
  group_by(ID, Activity) %>%
  mutate(sum_Activity = sum(Duration)) %>%
  group_by(ID) %>%
  mutate(Max_Activity = Activity[which.max(sum_Activity)]) %>%
  select(-sum_Activity) %>%
  ungroup() 
#> # A tibble: 7 × 4
#>      ID Activity   Duration Max_Activity
#>   <int> <chr>         <int> <chr>       
#> 1     1 " Reading"       20 " Reading"  
#> 2     1 " Work"          40 " Reading"  
#> 3     1 " Reading"       30 " Reading"  
#> 4     2 " Home"          50 " Writing"  
#> 5     2 " Writing"       30 " Writing"  
#> 6     2 " Reading"       20 " Writing"  
#> 7     2 " Writing"       30 " Writing"

Created on 2022-07-21 by the reprex package (v2.0.1)

  • Related