I originally asked this question here:-
I need help with calculating % on a subset of data that I have. I am sharing the sample code below for the data:-
dff4 <- data.frame(stringsAsFactors = FALSE, check.names = FALSE,
Region = c("A", "A", "A", "A", "A", "B", "B", "B", "B", "B",
"C", "C", "C", "C", "C"),
Brand = c("B1", "B2", "B3",
"B4", "B5", "B1", "B2", "B3", "B4", "B5", "B1", "B2",
"B3", "B4", "B5"),
`2018` = c(2923, 2458, 2812, 2286,
1683, 1085, 2805, 3214, 1059, 1866, 3280, 2481, 2016,
1230, 1763),
`2019` = c(2497, 2306, 2264, 3602, 3381,
1778, 2470, 2249, 2297, 3264, 1071, 2345, 3815, 3685,
1381),
`2020` = c(3458, 1448, 2033, 1021, 2275, 1527,
1316, 2229, 3029, 1054, 3590, 2978, 2633, 3531, 2608),
`2021` = c(1496, 2196, 1448, 2344, 3853, 3499, 1681, 3282,
1693, 2102, 2235, 2007, 3796, 3394, 2421),
`2022` = c(3759,
3371, 2908, 3222, 1720, 2862, 3767, 2544, 3299, 3961,
1030, 1268, 2652, 3656, 3053))
Created on 2022-08-15 by the reprex package (v2.0.1)
I want to calculate the % share of Brand B1 by different regions across all columns. So the steps would be like taking a % on parent total like we do in Excel pivot.
I tried using the below code, however it calculates the % on the sum of the total column not the subset of the column REGION.
dff4 %>% mutate(across(where(is.double), ~./sum(.), .names = "perc_{.col}"))
Created on 2022-08-15 by the reprex package (v2.0.1)
I also tried running the code below which gives me the exact answer that I want however I can not replicate it across the columns without writing the code for each column from 2018 to 2022 separately.
transform(dff4, percent = ave(dff4$`2018`, dff4$`Internal region`,
FUN = prop.table))
Created on 2022-08-15 by the reprex package (v2.0.1)
Any help would be appreciated.
CodePudding user response:
I think what you're looking for is to group_by
region first:
dff4 <- dff4 %>%
group_by(Region) %>%
mutate(across(`2018`:`2022`, ~ ./sum(.), .names = "perc_{.col}"))