Home > Back-end >  How to calculate percentage across a subset of column
How to calculate percentage across a subset of column

Time:08-15

I originally asked this question here:-

https://datascience.stackexchange.com/questions/113526/need-help-with-calculating-percentage-across-a-subset-of-column

I need help with calculating % on a subset of data that I have. I am sharing the sample code below for the data:-

dff4 <- data.frame(stringsAsFactors = FALSE, check.names = FALSE, 
    Region = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B", 
        "C", "C", "C", "C", "C"), 
Brand = c("B1", "B2", "B3", 
        "B4", "B5", "B1", "B2", "B3", "B4", "B5", "B1", "B2", 
        "B3", "B4", "B5"), 
`2018` = c(2923, 2458, 2812, 2286, 
        1683, 1085, 2805, 3214, 1059, 1866, 3280, 2481, 2016, 
        1230, 1763), 
`2019` = c(2497, 2306, 2264, 3602, 3381, 
        1778, 2470, 2249, 2297, 3264, 1071, 2345, 3815, 3685, 
        1381), 
`2020` = c(3458, 1448, 2033, 1021, 2275, 1527, 
        1316, 2229, 3029, 1054, 3590, 2978, 2633, 3531, 2608), 
`2021` = c(1496, 2196, 1448, 2344, 3853, 3499, 1681, 3282, 
        1693, 2102, 2235, 2007, 3796, 3394, 2421), 
`2022` = c(3759, 
        3371, 2908, 3222, 1720, 2862, 3767, 2544, 3299, 3961, 
        1030, 1268, 2652, 3656, 3053))

Created on 2022-08-15 by the reprex package (v2.0.1)

I want to calculate the % share of Brand B1 by different regions across all columns. So the steps would be like taking a % on parent total like we do in Excel pivot.

I tried using the below code, however it calculates the % on the sum of the total column not the subset of the column REGION.

dff4 %>% mutate(across(where(is.double), ~./sum(.), .names = "perc_{.col}"))
Created on 2022-08-15 by the reprex package (v2.0.1)

I also tried running the code below which gives me the exact answer that I want however I can not replicate it across the columns without writing the code for each column from 2018 to 2022 separately.

transform(dff4, percent = ave(dff4$`2018`, dff4$`Internal region`, 
    FUN = prop.table))
Created on 2022-08-15 by the reprex package (v2.0.1)

Any help would be appreciated.

CodePudding user response:

I think what you're looking for is to group_by region first:

dff4 <- dff4 %>% 
  group_by(Region) %>% 
  mutate(across(`2018`:`2022`, ~ ./sum(.), .names = "perc_{.col}"))
  • Related