Home > OS >  Calculating columns in percent based on other columns for particular category in R
Calculating columns in percent based on other columns for particular category in R

Time:10-28

I'm a beginner with R and need help with the task. The output of dummy dataset with dput (real set is really big):

structure(list(CODE = c(453, 463, 476, 
798, 885, 582, 626, 663, 457, 408
), CATEGORY = c("CIG", "BET", "CIG", "CIG", "ARI", "CRR", "ARI", "CIG", 
"CIG", "BET"), AMOUNT = c(22, 5, 6, 52, 16, 11, 6, 70, 208, 5), 
    PRICE = c(5.56, 8.29, 3.89, 3.8, 4.05, 3.99, 3.55, 7.69, 6.75, 
    5.2), BRAND = c("ROTHMANS", "ALLINO", "MARLBORO", "ROTHMANS", "AURIELO", 
    "SOLINOS", "CHLEBLO", "MARLBORO", "LD", "SOLINOS"
    )), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

What should be done: a column which will show the share of every BRAND in percent should be added. First what I have done is create a new column "VALUE" this way:

df$VALUE <- with(df, AMOUNT*PRICE)

Now the column SHARE must be created in such a way: sum of values (from column VALUE) of a particular brand in a particular category divided by the sum of values of the whole category. For example, "ROTHMANS" is in category CIG, the sum of values for it is 319.92, the sum for the whole CIG category is 2285,56, so, the SHARE=14%. And it should be calculated in each case. I think dplyr library can be suitable, but can't find a solution.

CodePudding user response:

You may first sum the BRAND value and get the proportion for each CATEGORY.

library(dplyr)

df %>%
  group_by(CATEGORY, BRAND) %>%
  summarise(VALUE = sum(VALUE)) %>%
  mutate(SHARE = prop.table(VALUE) * 100) %>%
  ungroup

#  CATEGORY BRAND     VALUE SHARE
#  <chr>    <chr>     <dbl> <dbl>
#1 ARI      AURIELO    64.8  75.3
#2 ARI      CHLEBLO    21.3  24.7
#3 BET      ALLINO     41.4  61.5
#4 BET      SOLINOS    26    38.5
#5 CIG      LD       1404    61.4
#6 CIG      MARLBORO  562.   24.6
#7 CIG      ROTHMANS  320.   14.0
#8 CRR      SOLINOS    43.9 100  

CodePudding user response:

A data.table solution could be:

library(data.table)

res <- setDT(df)[,'.'(VALUE = sum(VALUE)), by = list(CATEGORY,BRAND)
               ][,':='(SHARE = round(VALUE/sum(VALUE)*100,2)), by = list(CATEGORY)]

res
  CATEGORY    BRAND   VALUE  SHARE
1:      CIG ROTHMANS  319.92  14.00
2:      BET   ALLINO   41.45  61.45
3:      CIG MARLBORO  561.64  24.57
4:      ARI  AURIELO   64.80  75.26
5:      CRR  SOLINOS   43.89 100.00
6:      ARI  CHLEBLO   21.30  24.74
7:      CIG       LD 1404.00  61.43
8:      BET  SOLINOS   26.00  38.55

EDIT

To keep original values could be this:

res <- setDT(df)[,'.'(VALUE = sum(VALUE)), by = list(CATEGORY,BRAND)
               ][,':='(SHARE = round(VALUE/sum(VALUE)*100,2)), by = list(CATEGORY)
               ][setDT(df), on = c('BRAND','CATEGORY')
               ][,-('i.VALUE')]
res

    CATEGORY    BRAND   VALUE  SHARE CODE AMOUNT PRICE
 1:      CIG ROTHMANS  319.92  14.00  453     22  5.56
 2:      BET   ALLINO   41.45  61.45  463      5  8.29
 3:      CIG MARLBORO  561.64  24.57  476      6  3.89
 4:      CIG ROTHMANS  319.92  14.00  798     52  3.80
 5:      ARI  AURIELO   64.80  75.26  885     16  4.05
 6:      CRR  SOLINOS   43.89 100.00  582     11  3.99
 7:      ARI  CHLEBLO   21.30  24.74  626      6  3.55
 8:      CIG MARLBORO  561.64  24.57  663     70  7.69
 9:      CIG       LD 1404.00  61.43  457    208  6.75
10:      BET  SOLINOS   26.00  38.55  408      5  5.20

CodePudding user response:

We can use base R

transform(aggregate(VALUE ~ CATEGORY   BRAND, df, sum), 
    SHARE = ave(VALUE, CATEGORY, FUN = proportions) * 100)
  CATEGORY    BRAND   VALUE     SHARE
1      BET   ALLINO   41.45  61.45293
2      ARI  AURIELO   64.80  75.26132
3      ARI  CHLEBLO   21.30  24.73868
4      CIG       LD 1404.00  61.42915
5      CIG MARLBORO  561.64  24.57341
6      CIG ROTHMANS  319.92  13.99744
7      BET  SOLINOS   26.00  38.54707
8      CRR  SOLINOS   43.89 100.00000
  • Related