Below is the sample data and code. I have two issues. First, I need the indtotal
column to be the sum by the twodigit
code and have it stay constant as shown below. The reasons is so that I can do a simple calculation of one column divided by the other to arrive at the smbshare
number. When I try the following,
second <- first %>%
group_by(twodigit,smb) %>%
summarize(indtotal = sum(employment))
it breaks it down by twodigit
and smb
.
Second issue is having it produce an 0 if the value does not exist. Best example is twodigit
code of 51 and smb = 4
. When there are not 4 distinct smb
values for a given two digit, I am looking for it to produce a 0.
Note: smb
is short for small business
naicstest <- c (512131,512141,521921,522654,512131,536978,541214,531214,621112,541213,551212,574121,569887,541211,523141,551122,512312,521114,522112)
employment <- c(11,130,315,17,190,21,22,231,15,121,19,21,350,110,515,165,12,110,111)
smb <- c(1,2,3,1,3,1,1,3,1,2,1,1,4,2,4,3,1,2,2)
first <- data.frame(naicstest,employment,smb)
first<-first %>% mutate(twodigit = substr(naicstest,1,2))
second <- first %>% group_by(twodigit) %>% summarize(indtotal = sum(employment))
Desired result is below
twodigit indtotal smb smbtotal smbshare
51 343 1 23 (11 12) 23/343
51 343 2 130 130/343
51 343 3 190 190/343
51 343 4 0 0/343
52 1068 1 17 23/1068
52 1068 2 221 (110 111) 221/1068
52 1068 3 315 315/1068
52 1068 4 515 515/1068
CodePudding user response:
This gives you all the columns you need, but in a slightly different order. You could use select
or relocate
to get them in the order you want I suppose:
first %>%
group_by(twodigit, smb) %>%
summarize(smbtotal = sum(employment)) %>%
ungroup() %>%
complete(twodigit, smb, fill = list('smbtotal' = 0)) %>%
group_by(twodigit) %>%
mutate(
indtotal = sum(smbtotal),
smbshare = smbtotal / indtotal
)
`summarise()` has grouped output by 'twodigit'. You can override using the `.groups` argument.
# A tibble: 32 × 5
# Groups: twodigit [8]
twodigit smb smbtotal indtotal smbshare
<chr> <dbl> <dbl> <dbl> <dbl>
1 51 1 23 343 0.0671
2 51 2 130 343 0.379
3 51 3 190 343 0.554
4 51 4 0 343 0
5 52 1 17 1068 0.0159
6 52 2 221 1068 0.207
7 52 3 315 1068 0.295
8 52 4 515 1068 0.482
9 53 1 21 252 0.0833
10 53 2 0 252 0
# … with 22 more rows