I am trying to create a column called share which is calculated by dividing each value where noc_code is xxxx against noc_code=9999 WITHIN each industry group.
Here is a sample of my table
Industry | noc_doe | VALUE |
---|---|---|
mining | 9999 | 100 |
mining | 0001 | 15 |
mining | 0005 | 33 |
mining | 0941 | 50 |
agriculture | 9999 | 100 |
agriculture | 0701 | 25 |
agriculture | 6785 | 4 |
agriculture | 8456 | 90 |
my code is this, but it is not working
df %>%
group_by(Industry)%>%
mutate(Share = VALUE / VALUE[match("9999", noc_code)])-> df
I am hoping to get this result
Industry | noc_doe | VALUE | Share |
---|---|---|---|
mining | 9999 | 100 | 1 |
mining | 0001 | 15 | 0.15 |
mining | 0005 | 33 | 0.33 |
mining | 0941 | 50 | 0.5 |
agriculture | 9999 | 100 | 1 |
agriculture | 0701 | 25 | 0.25 |
agriculture | 6785 | 4 | 0.04 |
agriculture | 8456 | 90 | 0.9 |
CodePudding user response:
Here is a way.
Create a temp column div
holding the divisor corresponding to the wanted noc_doe
, by group of industry.
df<-"Industry noc_doe VALUE
mining 9999 100
mining 0001 15
mining 0005 33
mining 0941 50
agriculture 9999 100
agriculture 0701 25
agriculture 6785 4
agriculture 8456 90"
df <- read.table(textConnection(df), header = TRUE)
suppressPackageStartupMessages(library(dplyr))
df %>%
group_by(Industry) %>%
mutate(
div = VALUE[noc_doe == 9999],
share = VALUE/div
) %>%
select(-div)
#> # A tibble: 8 × 4
#> # Groups: Industry [2]
#> Industry noc_doe VALUE share
#> <chr> <int> <int> <dbl>
#> 1 mining 9999 100 1
#> 2 mining 1 15 0.15
#> 3 mining 5 33 0.33
#> 4 mining 941 50 0.5
#> 5 agriculture 9999 100 1
#> 6 agriculture 701 25 0.25
#> 7 agriculture 6785 4 0.04
#> 8 agriculture 8456 90 0.9
Created on 2022-09-08 by the reprex package (v2.0.1)
CodePudding user response:
I would do it like this
library(tidyverse)
df <-
structure(
list(
Industry = c(
"mining",
"mining",
"mining",
"mining",
"agriculture",
"agriculture",
"agriculture",
"agriculture"
),
noc_doe = c(9999L, 1L, 5L, 941L, 9999L, 701L, 6785L, 8456L),
VALUE = c(100L, 15L, 33L, 50L, 100L, 25L, 4L, 90L)
),
class = "data.frame",
row.names = c(NA,-8L)
)
df %>%
group_by(Industry) %>%
mutate(
Share = VALUE/first(VALUE, order_by = desc(noc_doe == 9999L))
)
#> # A tibble: 8 × 4
#> # Groups: Industry [2]
#> Industry noc_doe VALUE Share
#> <chr> <int> <int> <dbl>
#> 1 mining 9999 100 1
#> 2 mining 1 15 0.15
#> 3 mining 5 33 0.33
#> 4 mining 941 50 0.5
#> 5 agriculture 9999 100 1
#> 6 agriculture 701 25 0.25
#> 7 agriculture 6785 4 0.04
#> 8 agriculture 8456 90 0.9
Created on 2022-09-08 by the reprex package (v2.0.1)