I have sales data by year, condition and products
Year <- c(2010,2010,2010,2010,2010,2010,2011,2011,2011,2011,2011,2011,2012,2012,2012,2012,2012,2012)
Sale <- c("30","45","23","33","24","11","56","19","45","56","33","32","89","33","12",18,10,17)
Condition <- c("New","New","New","Used","Used","Used","New","New","New","Used","Used","Used","New","New","New","Used","Used","Used")
Product <- c("a","b","c","a","b","c","a","b","c","a","b","c","a","b","c","a","b","c")
df <- data.frame(Year,Condition, Product, Sale)
Now I want to calculate the share of each product by condition variable within each year. I tried the following code, but it calculates based on total no by year and "condition"
df$percentage <- df$Sale/sum(df$Sale)*100
CodePudding user response:
First convert Sale
from character to numeric with type.convert(as.is = TRUE)
,
then group by the desired columns and apply summarise
:
Note that in your provided dataframe you will get 100 for percentage because of your provided data:
With this fake data
set.seed(123)
Year <- sample(c(2010, 2011, 2012), 18, replace = TRUE)
Sale <- c("30","45","23","33","24","11","56","19","45","56","33","32","89","33","12",18,10,17)
Condition <- sample(c("Used","New"), 18, replace = TRUE)
Product <- sample(c("a","b","c"), 18, replace = TRUE)
df <- data.frame(Year,Condition, Product, Sale)
using this code
library(dplyr)
df %>%
type.convert(as.is=TRUE) %>%
group_by(Year, Product, Condition) %>%
summarise(percentage = Sale/sum(Sale)*100)
you will get:
Year Product Condition percentage
<int> <chr> <chr> <dbl>
1 2010 a Used 83.2
2 2010 a Used 16.8
3 2010 c New 100
4 2011 a New 100
5 2011 a Used 42.9
6 2011 a Used 14.3
7 2011 a Used 42.9
8 2011 b New 100
9 2011 c New 49.2
10 2011 c New 50.8
11 2012 a Used 63.8
12 2012 a Used 36.2
13 2012 b New 100
14 2012 b Used 69.7
15 2012 b Used 30.3
16 2012 c New 100
17 2012 c Used 34.8
18 2012 c Used 65.2
Update: to keep Sale
column: replace summarise
with mutate
df %>%
type.convert(as.is=TRUE) %>%
group_by(Year, Product, Condition) %>%
mutate(percentage = paste(round(Sale/sum(Sale)*100, 1), "%"))
Year Condition Product Sale percentage
<int> <chr> <chr> <int> <chr>
1 2012 Used a 30 63.8 %
2 2012 New c 45 100 %
3 2012 Used b 23 69.7 %
4 2011 Used a 33 42.9 %
5 2012 Used c 24 34.8 %
6 2011 Used a 11 14.3 %
7 2011 New a 56 100 %
8 2011 New b 19 100 %
9 2012 Used c 45 65.2 %
10 2010 New c 56 100 %
11 2011 Used a 33 42.9 %
12 2011 New c 32 49.2 %
13 2010 Used a 89 83.2 %
14 2011 New c 33 50.8 %
15 2012 New b 12 100 %
16 2010 Used a 18 16.8 %
17 2012 Used b 10 30.3 %
18 2012 Used a 17 36.2 %
CodePudding user response:
Here is a base
solution using ave()
. You can replace grouping variables in ave
with any others you want.
within(df, {
perc1 = ave(as.numeric(Sale), Year, Product, FUN = proportions) * 100
perc2 = sprintf("%.1f %%", perc1)
})
Year Condition Product Sale perc2 perc1
1 2010 New a 30 47.6 % 47.61905
2 2010 New b 45 65.2 % 65.21739
3 2010 New c 23 67.6 % 67.64706
4 2010 Used a 33 52.4 % 52.38095
5 2010 Used b 24 34.8 % 34.78261
6 2010 Used c 11 32.4 % 32.35294
7 2011 New a 56 50.0 % 50.00000
8 2011 New b 19 36.5 % 36.53846
9 2011 New c 45 58.4 % 58.44156
10 2011 Used a 56 50.0 % 50.00000
11 2011 Used b 33 63.5 % 63.46154
12 2011 Used c 32 41.6 % 41.55844