Home > Back-end >  How to sum rows based on multiple conditions and replace it in the dataframe?
How to sum rows based on multiple conditions and replace it in the dataframe?

Time:01-12

R beginner here in need of some help. I have this dataframe:

dat<-data.frame(Name=c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C"),
                Score=c(1,2,3,4,5,1,2,3,4,5,1,2,3,4,5),
                Frequency=c(9,11,10,5,5,3,7,10,5,5,20,3,3,2,2))

enter image description here

And I want to sum the frequencies of rows with scores 2-3 and 4-5 by name, and rename the scores High (score 1), Medium (scores 2-3) or Low (scores 4-5). Basically my dataframe should look like this:

enter image description here

Is there a more straightforward way to do this? Thanks a lot!

CodePudding user response:

You could first use case_when to convert the score to right class en then group_by and sumamrise your data like this:

library(dplyr)

dat %>%
  mutate(Score = case_when(Score == 1 ~ "High",
                           Score %in% c(2,3) ~ "Medium",
                           TRUE ~ "Low")) %>%
  group_by(Name, Score) %>%
  summarise(Frequency = sum(Frequency))
#> `summarise()` has grouped output by 'Name'. You can override using the
#> `.groups` argument.
#> # A tibble: 9 × 3
#> # Groups:   Name [3]
#>   Name  Score  Frequency
#>   <chr> <chr>      <dbl>
#> 1 A     High           9
#> 2 A     Low           10
#> 3 A     Medium        21
#> 4 B     High           3
#> 5 B     Low           10
#> 6 B     Medium        17
#> 7 C     High          20
#> 8 C     Low            4
#> 9 C     Medium         6

Created on 2023-01-11 with reprex v2.0.2

CodePudding user response:

Here is a base R approach.

First, create Category based on the Score using cut:

dat$Category <- cut(dat$Score, 
                    breaks = c(1, 2, 4, 5), 
                    labels = c("High", "Medium", "Low"), 
                    include.lowest = T, 
                    right = F)

Then you can aggregate based on both Name and Category to get the final result:

aggregate(Frequency ~ Name   Category, data = dat, sum)

Output

  Name Category Frequency
1    A     High         9
2    B     High         3
3    C     High        20
4    A   Medium        21
5    B   Medium        17
6    C   Medium         6
7    A      Low        10
8    B      Low        10
9    C      Low         4
  • Related