I'm a beginner with R and need help with the task. The output of dummy dataset with dput (real set is really big):
structure(list(CODE = c(453, 463, 476,
798, 885, 582, 626, 663, 457, 408
), CATEGORY = c("CIG", "BET", "CIG", "CIG", "ARI", "CRR", "ARI", "CIG",
"CIG", "BET"), AMOUNT = c(22, 5, 6, 52, 16, 11, 6, 70, 208, 5),
PRICE = c(5.56, 8.29, 3.89, 3.8, 4.05, 3.99, 3.55, 7.69, 6.75,
5.2), BRAND = c("ROTHMANS", "ALLINO", "MARLBORO", "ROTHMANS", "AURIELO",
"SOLINOS", "CHLEBLO", "MARLBORO", "LD", "SOLINOS"
)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
What should be done: a column which will show the share of every BRAND in percent should be added. First what I have done is create a new column "VALUE" this way:
df$VALUE <- with(df, AMOUNT*PRICE)
Now the column SHARE must be created in such a way: sum of values (from column VALUE) of a particular brand in a particular category divided by the sum of values of the whole category. For example, "ROTHMANS" is in category CIG, the sum of values for it is 319.92, the sum for the whole CIG category is 2285,56, so, the SHARE=14%. And it should be calculated in each case. I think dplyr library can be suitable, but can't find a solution.
CodePudding user response:
You may first sum
the BRAND
value and get the proportion for each CATEGORY
.
library(dplyr)
df %>%
group_by(CATEGORY, BRAND) %>%
summarise(VALUE = sum(VALUE)) %>%
mutate(SHARE = prop.table(VALUE) * 100) %>%
ungroup
# CATEGORY BRAND VALUE SHARE
# <chr> <chr> <dbl> <dbl>
#1 ARI AURIELO 64.8 75.3
#2 ARI CHLEBLO 21.3 24.7
#3 BET ALLINO 41.4 61.5
#4 BET SOLINOS 26 38.5
#5 CIG LD 1404 61.4
#6 CIG MARLBORO 562. 24.6
#7 CIG ROTHMANS 320. 14.0
#8 CRR SOLINOS 43.9 100
CodePudding user response:
A data.table
solution could be:
library(data.table)
res <- setDT(df)[,'.'(VALUE = sum(VALUE)), by = list(CATEGORY,BRAND)
][,':='(SHARE = round(VALUE/sum(VALUE)*100,2)), by = list(CATEGORY)]
res
CATEGORY BRAND VALUE SHARE
1: CIG ROTHMANS 319.92 14.00
2: BET ALLINO 41.45 61.45
3: CIG MARLBORO 561.64 24.57
4: ARI AURIELO 64.80 75.26
5: CRR SOLINOS 43.89 100.00
6: ARI CHLEBLO 21.30 24.74
7: CIG LD 1404.00 61.43
8: BET SOLINOS 26.00 38.55
EDIT
To keep original values could be this:
res <- setDT(df)[,'.'(VALUE = sum(VALUE)), by = list(CATEGORY,BRAND)
][,':='(SHARE = round(VALUE/sum(VALUE)*100,2)), by = list(CATEGORY)
][setDT(df), on = c('BRAND','CATEGORY')
][,-('i.VALUE')]
res
CATEGORY BRAND VALUE SHARE CODE AMOUNT PRICE
1: CIG ROTHMANS 319.92 14.00 453 22 5.56
2: BET ALLINO 41.45 61.45 463 5 8.29
3: CIG MARLBORO 561.64 24.57 476 6 3.89
4: CIG ROTHMANS 319.92 14.00 798 52 3.80
5: ARI AURIELO 64.80 75.26 885 16 4.05
6: CRR SOLINOS 43.89 100.00 582 11 3.99
7: ARI CHLEBLO 21.30 24.74 626 6 3.55
8: CIG MARLBORO 561.64 24.57 663 70 7.69
9: CIG LD 1404.00 61.43 457 208 6.75
10: BET SOLINOS 26.00 38.55 408 5 5.20
CodePudding user response:
We can use base R
transform(aggregate(VALUE ~ CATEGORY BRAND, df, sum),
SHARE = ave(VALUE, CATEGORY, FUN = proportions) * 100)
CATEGORY BRAND VALUE SHARE
1 BET ALLINO 41.45 61.45293
2 ARI AURIELO 64.80 75.26132
3 ARI CHLEBLO 21.30 24.73868
4 CIG LD 1404.00 61.42915
5 CIG MARLBORO 561.64 24.57341
6 CIG ROTHMANS 319.92 13.99744
7 BET SOLINOS 26.00 38.54707
8 CRR SOLINOS 43.89 100.00000