Home > OS >  Creating a new table with a column that calculates a value from 2 different variables from the first
Creating a new table with a column that calculates a value from 2 different variables from the first

Time:03-27

I'm trying to create a new table from a previous table, which calculates the differential values from variables inserted in the first table. The first table is in this formula:

       gender AgeGroup real_wages.mean real_wages.stand_dev
1       1        A        5.664902             8.140341
2       2        A        5.177728             3.781762
3       1        B        7.191969            17.171500
4       2        B        6.307641             7.659018
...

So the idea for the new table is for it to have a column called "Differential",which, for each "AgeGroup", calculates the following:

("Real_wages.mean" of gender 1" - "Real Wages.mean" of gender 2)) / ("Real_mages.mean of gender 1"))

This means that the table would have one row per GroupAge and the column Differential which would have the result from the equation stated above.

Any idea on how to code this in R? Thank you!

CodePudding user response:

It's always better to provide a minimal reproducible example:

thedata <- structure(list(
  id = 1:4, 
  gender = c(1L, 2L, 1L, 2L), 
  AgeGroup = c("A", "A", "B", "B"), 
  real_wages.mean = c(5.664902, 5.177728, 7.191969,  6.307641), 
  real_wages.stand_dev = c(8.140341, 3.781762, 17.1715,  7.659018)), 
  row.names = c(NA, -4L), class = c("data.table", "data.frame"))

A possible solution is to use tidyr::pivot_wider: In order to do so we first use select to get rid of all the columns we dont want to use, so they don't lead to aggregation problems when calling pivot_wider. Then we use pivot_wider to put the values we want to use for calculation later into separate columns. Finally we use mutate to do the actual calculation for the column "Differential"

library(dplyr)
library(tidyr)

thedata %>% 
  select(-c(id, real_wages.stand_dev)) %>% 
  pivot_wider(names_from = gender, values_from = real_wages.mean, names_prefix = "mean_wages_gender_") %>% 
  mutate(Differential = (mean_wages_gender_1 - mean_wages_gender_2) / mean_wages_gender_1)

Returns:

 AgeGroup mean_wages_gender_1 mean_wages_gender_2 Differential
 <chr>                  <dbl>               <dbl>        <dbl>
1 A                       5.66                5.18       0.0860
2 B                       7.19                6.31       0.123

CodePudding user response:

Another possible solution:

library(dplyr)

df %>% 
  group_by(AgeGroup) %>% 
  summarise(Differential = (first(real_wages.mean) - last(real_wages.mean))
            / first(real_wages.mean))

#> # A tibble: 2 × 2
#>   AgeGroup Differential
#>   <chr>           <dbl>
#> 1 A              0.0860
#> 2 B              0.123

CodePudding user response:

Applying the specified calculation logic according to AgeGroup using ave.

transform(dat, Diff=ave(real_wages.mean, AgeGroup, FUN=\(x) (x[1] - x[2])/x[1]))
#   gender AgeGroup real_wages.mean real_wages.stand_dev       Diff
# 1      1        A        5.664902             8.140341 0.08599866
# 2      2        A        5.177728             3.781762 0.08599866
# 3      1        B        7.191969            17.171500 0.12296049
# 4      2        B        6.307641             7.659018 0.12296049

Data:

dat <- structure(list(gender = c(1L, 2L, 1L, 2L), AgeGroup = c("A", 
"A", "B", "B"), real_wages.mean = c(5.664902, 5.177728, 7.191969, 
6.307641), real_wages.stand_dev = c(8.140341, 3.781762, 17.1715, 
7.659018)), row.names = c(NA, -4L), class = "data.frame")
  • Related