I need to calculate weights from a column in a dataframe based on some conditions.
I have Total Assets from several banks, with different countries, years and specializations.
For each bank I want to calculate a weight (w) where w(i) = Tot_Asset (bank) / sum (Tot_Ass of all banks within same year, country and Specialization)
Example dataframe:
banks <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
Country <- c("NL", "ES", "IT", "IT", "ES", "NL", "FR", "NL", "ES", "NL", "IT", "IT", "NL", "ES", "NL")
year <- c(2020, 2019, 2018, 2019, 2020, 2020, 2018, 2019, 2019, 2019, 2018, 2019, 2020, 2018, 2020)
Specialization <- c("cooperative", "saving", "cooperative", "cooperative", "saving", "cooperative", "saving", "cooperative", "cooperative", "saving", "cooperative", "saving", "cooperative", "cooperative", "cooperative")
Tot_Assets <- c(100, 200, 145, 300, 200, 345, 543, 190, 150, 120, 310, 210, 110, 210, 220)
data <- data.frame(banks, Country, year, Specialization, Tot_Assets)
As an example of what I would like to obtain:
The bank 1 is in NL, is a cooperative and the Total Asset is from 2020; bank 6, 13 and 15 has the same caracteristics. So the program has to do :
- w(1) = 100 / (100 345 110 220) = 0.13
- w(6) = 345 / (100 345 110 220) = 0.45
- and so on
I hope that I explained myself and that you can help me, thanks in advance!
CodePudding user response:
We may group by 'Country', 'year', 'Specialization', and create the 'weights' column by dividing the 'Tot_Assets' with the sum
of 'Tot_Assets'
library(dplyr)
data <- data %>%
group_by(Country, year, Specialization) %>%
mutate(weights = Tot_Assets/sum(Tot_Assets)) %>%
ungroup
-output
data
# A tibble: 15 × 6
banks Country year Specialization Tot_Assets weights
<dbl> <chr> <dbl> <chr> <dbl> <dbl>
1 1 NL 2020 cooperative 100 0.129
2 2 ES 2019 saving 200 1
3 3 IT 2018 cooperative 145 0.319
4 4 IT 2019 cooperative 300 1
5 5 ES 2020 saving 200 1
6 6 NL 2020 cooperative 345 0.445
7 7 FR 2018 saving 543 1
8 8 NL 2019 cooperative 190 1
9 9 ES 2019 cooperative 150 1
10 10 NL 2019 saving 120 1
11 11 IT 2018 cooperative 310 0.681
12 12 IT 2019 saving 210 1
13 13 NL 2020 cooperative 110 0.142
14 14 ES 2018 cooperative 210 1
15 15 NL 2020 cooperative 220 0.284
CodePudding user response:
A base R option using ave
(for grouping purpose) and proportions
(to calculate the weight distribution)
transform(
data,
weights = ave(Tot_Assets,Country,year,Specialization, FUN = proportions)
)
gives
banks Country year Specialization Tot_Assets weights
1 1 NL 2020 cooperative 100 0.1290323
2 2 ES 2019 saving 200 1.0000000
3 3 IT 2018 cooperative 145 0.3186813
4 4 IT 2019 cooperative 300 1.0000000
5 5 ES 2020 saving 200 1.0000000
6 6 NL 2020 cooperative 345 0.4451613
7 7 FR 2018 saving 543 1.0000000
8 8 NL 2019 cooperative 190 1.0000000
9 9 ES 2019 cooperative 150 1.0000000
10 10 NL 2019 saving 120 1.0000000
11 11 IT 2018 cooperative 310 0.6813187
12 12 IT 2019 saving 210 1.0000000
13 13 NL 2020 cooperative 110 0.1419355
14 14 ES 2018 cooperative 210 1.0000000
15 15 NL 2020 cooperative 220 0.2838710