I'm trying to create a new table from a previous table, which calculates the differential values from variables inserted in the first table. The first table is in this formula:
gender AgeGroup real_wages.mean real_wages.stand_dev
1 1 A 5.664902 8.140341
2 2 A 5.177728 3.781762
3 1 B 7.191969 17.171500
4 2 B 6.307641 7.659018
...
So the idea for the new table is for it to have a column called "Differential",which, for each "AgeGroup", calculates the following:
("Real_wages.mean" of gender 1" - "Real Wages.mean" of gender 2)) / ("Real_mages.mean of gender 1"))
This means that the table would have one row per GroupAge and the column Differential which would have the result from the equation stated above.
Any idea on how to code this in R? Thank you!
CodePudding user response:
It's always better to provide a minimal reproducible example:
thedata <- structure(list(
id = 1:4,
gender = c(1L, 2L, 1L, 2L),
AgeGroup = c("A", "A", "B", "B"),
real_wages.mean = c(5.664902, 5.177728, 7.191969, 6.307641),
real_wages.stand_dev = c(8.140341, 3.781762, 17.1715, 7.659018)),
row.names = c(NA, -4L), class = c("data.table", "data.frame"))
A possible solution is to use tidyr::pivot_wider: In order to do so we first use select
to get rid of all the columns we dont want to use, so they don't lead to aggregation problems when calling pivot_wider
. Then we use pivot_wider
to put the values we want to use for calculation later into separate columns. Finally we use mutate
to do the actual calculation for the column "Differential"
library(dplyr)
library(tidyr)
thedata %>%
select(-c(id, real_wages.stand_dev)) %>%
pivot_wider(names_from = gender, values_from = real_wages.mean, names_prefix = "mean_wages_gender_") %>%
mutate(Differential = (mean_wages_gender_1 - mean_wages_gender_2) / mean_wages_gender_1)
Returns:
AgeGroup mean_wages_gender_1 mean_wages_gender_2 Differential <chr> <dbl> <dbl> <dbl> 1 A 5.66 5.18 0.0860 2 B 7.19 6.31 0.123
CodePudding user response:
Another possible solution:
library(dplyr)
df %>%
group_by(AgeGroup) %>%
summarise(Differential = (first(real_wages.mean) - last(real_wages.mean))
/ first(real_wages.mean))
#> # A tibble: 2 × 2
#> AgeGroup Differential
#> <chr> <dbl>
#> 1 A 0.0860
#> 2 B 0.123
CodePudding user response:
Applying the specified calculation logic according to AgeGroup
using ave
.
transform(dat, Diff=ave(real_wages.mean, AgeGroup, FUN=\(x) (x[1] - x[2])/x[1]))
# gender AgeGroup real_wages.mean real_wages.stand_dev Diff
# 1 1 A 5.664902 8.140341 0.08599866
# 2 2 A 5.177728 3.781762 0.08599866
# 3 1 B 7.191969 17.171500 0.12296049
# 4 2 B 6.307641 7.659018 0.12296049
Data:
dat <- structure(list(gender = c(1L, 2L, 1L, 2L), AgeGroup = c("A",
"A", "B", "B"), real_wages.mean = c(5.664902, 5.177728, 7.191969,
6.307641), real_wages.stand_dev = c(8.140341, 3.781762, 17.1715,
7.659018)), row.names = c(NA, -4L), class = "data.frame")