I am trying to select a single row after using several steps in dplyr such as mutation and group_by() then summarizing over the selected rows. Here is an example of what I am dealing with and the solutions that I tried so far
score <- data.frame(
id = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2),
interval = c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,
1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2),
category = c(1,1,2,2,3,3,4,4,5,5,1,1,2,2,3,3,4,4,5,5,
1,1,2,2,3,3,4,4,5,5,1,1,2,2,3,3,4,4,5,5),
subcategory = c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,
1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2),
result = c(92,92,88,88,78,78,78,78,98,98,82,82,84,84,75,
75,86,86, 64,64,95,95,96,96,63,63,96,96,69,69,78,78,
88,88, 96,96,69,69,96,96))
This data frame has more than one level of grouping. Now there things that need to be done before doing the analysis: 1- The variable category represents a weighting variable such that: Category 1 = Category 2 = Category 3 = Category 4 = 12.5%. Category 5 = 50% 2- The variable results (maximum 100) will weighted accordingly. Therefore a new variable will be created called WRes. 3- This is the part where I have no idea how to handle. After doing the steps in 1 and 2. The WRes will be summed by id and interval. However, there are two results within the same grouping. I need to chose one only and sum. Basically, after grouping by id and interval, I need to chose one row from each of the categories 1,2,3,4,5 and sum them so the results will be <=100.
My attempts
S1 <- score %>%
mutate(Weight = ifelse(category %in% c(1,2,3,4), 0.125,0.5),
WRes = Weight * result) %>%
group_by(id, interval) %>%
score[!duplicated(score$category), ] %>%
summarize(SWRes = sum(Wres)
My second attempt
S1 <- score %>%
mutate(Weight = ifelse(category %in% c(1,2,3,4), 0.125,0.5),
WRes = Weight * result) %>%
group_by(id, interval) %>%
arrange(category) %>%
filter(row_number() == 1)
summarize(SWRes = sum(WRes)
CodePudding user response:
score
would reference to the whole dataframe, you need .
:
score %>%
mutate(Weight = ifelse(category %in% c(1,2,3,4), 0.125,0.5), WRes = Weight * result) %>%
group_by(id, interval) %>%
.[!duplicated(.$category), ] %>%
summarize(SWRes = sum(WRes))
CodePudding user response:
Here are couple of options -
- Using
filter
-
library(dplyr)
score %>%
mutate(Weight = ifelse(category %in% c(1,2,3,4), 0.125,0.5),
WRes = Weight * result) %>%
group_by(id, interval) %>%
filter(!duplicated(category)) %>%
summarize(SWRes = sum(WRes), .groups = 'drop')
# id interval SWRes
# <dbl> <dbl> <dbl>
#1 1 1 91
#2 1 2 72.9
#3 2 1 78.2
#4 2 2 89.4
- Using
distinct
score %>%
mutate(Weight = ifelse(category %in% c(1,2,3,4), 0.125,0.5),
WRes = Weight * result) %>%
distinct(id, interval, category, .keep_all = TRUE) %>%
group_by(id, interval) %>%
summarize(SWRes = sum(WRes), .groups = 'drop')