I have the dataset as follows, which group by ID:
ID, Activity, Duration
1, Reading, 20
1, Work, 40
1, Reading, 30
2, Home, 50
2, Writing, 30
2, Reading, 20
2, Writing, 30
And I want to get another column that tells us the activity with the highest duration, so person 1 should be reading because it takes 50 minutes, and person 2 should be writing because it takes 60 minutes. Below is an example of the desired output.
ID, Activity, Duration, Max_Actitvity
1, Reading, 20, Reading
1, Work, 40, Reading
1, Reading, 30, Reading
2, Home, 50, Writing
2, Writing, 30, Writing
2, Reading, 20, Writing
2, Writing, 30, Writing
CodePudding user response:
You can use the following code:
df <- read.table(text = "ID, Activity, Duration
1, Reading, 20
1, Work, 40
1, Reading, 30
2, Home, 50
2, Writing, 30
2, Reading, 20
2, Writing, 30", header = TRUE, sep = ",")
library(dplyr)
df %>%
group_by(ID, Activity) %>%
mutate(sum_Activity = sum(Duration)) %>%
group_by(ID) %>%
mutate(Max_Activity = Activity[which.max(sum_Activity)]) %>%
select(-sum_Activity) %>%
ungroup()
#> # A tibble: 7 × 4
#> ID Activity Duration Max_Activity
#> <int> <chr> <int> <chr>
#> 1 1 " Reading" 20 " Reading"
#> 2 1 " Work" 40 " Reading"
#> 3 1 " Reading" 30 " Reading"
#> 4 2 " Home" 50 " Writing"
#> 5 2 " Writing" 30 " Writing"
#> 6 2 " Reading" 20 " Writing"
#> 7 2 " Writing" 30 " Writing"
Created on 2022-07-21 by the reprex package (v2.0.1)