Home > Software design >  Group by a variable in dataframe R
Group by a variable in dataframe R

Time:07-08

I have a dataframe like below,

Date cat cam reg per
22-01-05 A 60 120 50
22-01-05 B 20 100 20
22-01-08 A 30 150 20
22-01-08 B 30 100 30

But i want something like below,

Date cam reg per
22-01-05 80 220 14.5
22-01-08 60 250 24

How to get this using R?

CodePudding user response:

I am not sure why your expected per values are like that, but maybe you want the following:

df <- data.frame(Date = c("22-01-05", "22-01-05", "22-01-08", "22-01-08"),
                 cat = c("A", "B", "A", "B"),
                 cam = c(60,20,30,30),
                 reg = c(120,100,150,100),
                 per = c(50,20,20,30))

library(dplyr)
df %>% 
  group_by(Date) %>% 
  summarise(cam = sum(cam),
            reg = sum(reg),
            per = cam/reg)
#> # A tibble: 2 × 4
#>   Date       cam   reg   per
#>   <chr>    <dbl> <dbl> <dbl>
#> 1 22-01-05    80   220 0.364
#> 2 22-01-08    60   250 0.24

Created on 2022-07-07 by the reprex package (v2.0.1)

CodePudding user response:

you can try this, but I don't how to get the value of per ,14.5 and 24

  library(dplyr)
 aggregate(cbind(cam, reg) ~ Date,df,sum) %>% mutate(per = 100*(cam/reg))
 A data.frame: 2 × 4
Date    cam      reg    per
<chr>   <dbl>   <dbl>   <dbl>
22-01-05    80   220    36.36364
22-01-08    60   250     24.00000

CodePudding user response:

Using only the package dplyr (which is part of package tidyverse) just do:

df %>% group_by(Date) %>% summarise(cam  = sum(cam),
                                reg = sum(reg),
                                per = 100*(cam/reg))

Date       cam   reg   per
<chr>    <int> <int> <dbl>
1 22-01-05    80   220  36.4
2 22-01-08    60   250  24 

The nice thing with this syntax is, you can modify and add additional variables like sum, but also like mean, median, etc. in a very clean and structured way.

  •  Tags:  
  • r
  • Related