Home > Software engineering >  How to select a single row after grouping using group_by() in R?
How to select a single row after grouping using group_by() in R?

Time:10-17

I am trying to select a single row after using several steps in dplyr such as mutation and group_by() then summarizing over the selected rows. Here is an example of what I am dealing with and the solutions that I tried so far

score <- data.frame(
  id = c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,
    2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2), 
  interval = c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2,
    1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2), 
  category = c(1,1,2,2,3,3,4,4,5,5,1,1,2,2,3,3,4,4,5,5,
    1,1,2,2,3,3,4,4,5,5,1,1,2,2,3,3,4,4,5,5), 
  subcategory = c(1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,
    1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2), 
  result = c(92,92,88,88,78,78,78,78,98,98,82,82,84,84,75,
    75,86,86, 64,64,95,95,96,96,63,63,96,96,69,69,78,78,
    88,88, 96,96,69,69,96,96))

This data frame has more than one level of grouping. Now there things that need to be done before doing the analysis: 1- The variable category represents a weighting variable such that: Category 1 = Category 2 = Category 3 = Category 4 = 12.5%. Category 5 = 50% 2- The variable results (maximum 100) will weighted accordingly. Therefore a new variable will be created called WRes. 3- This is the part where I have no idea how to handle. After doing the steps in 1 and 2. The WRes will be summed by id and interval. However, there are two results within the same grouping. I need to chose one only and sum. Basically, after grouping by id and interval, I need to chose one row from each of the categories 1,2,3,4,5 and sum them so the results will be <=100.

My attempts

S1 <- score %>%
  mutate(Weight = ifelse(category %in% c(1,2,3,4), 0.125,0.5),
         WRes = Weight * result) %>%
  group_by(id, interval) %>%
  score[!duplicated(score$category), ] %>%
  summarize(SWRes = sum(Wres)

My second attempt

S1 <- score %>%
  mutate(Weight = ifelse(category %in% c(1,2,3,4), 0.125,0.5),
         WRes = Weight * result) %>%
  group_by(id, interval) %>%
  arrange(category) %>%
  filter(row_number() == 1)
  summarize(SWRes = sum(WRes)

CodePudding user response:

score would reference to the whole dataframe, you need .:

score %>%
   mutate(Weight = ifelse(category %in% c(1,2,3,4), 0.125,0.5), WRes = Weight * result) %>%
   group_by(id, interval) %>%
   .[!duplicated(.$category), ] %>%
   summarize(SWRes = sum(WRes))

CodePudding user response:

Here are couple of options -

  1. Using filter -
library(dplyr)

score %>%
  mutate(Weight = ifelse(category %in% c(1,2,3,4), 0.125,0.5),
         WRes = Weight * result) %>%
  group_by(id, interval) %>%
  filter(!duplicated(category)) %>%
  summarize(SWRes = sum(WRes), .groups = 'drop')

#    id interval SWRes
#  <dbl>    <dbl> <dbl>
#1     1        1  91  
#2     1        2  72.9
#3     2        1  78.2
#4     2        2  89.4
  1. Using distinct
score %>%
  mutate(Weight = ifelse(category %in% c(1,2,3,4), 0.125,0.5),
         WRes = Weight * result) %>%
  distinct(id, interval, category, .keep_all = TRUE) %>%
  group_by(id, interval) %>%
  summarize(SWRes = sum(WRes), .groups = 'drop')
  • Related