It's my first time posting here and I'm quite new so would appreciate any help!
I have a dataframe made up of 2 different classification groups which are Age and Country. Firstly I want to calculate how many people there are in each each group as I need those numbers to make my second calculation.
age <- c('10-15','15-30','30-45', '10-15','15-30','30-45')
country <- c('Brazil','Japan','Turkey','Japan','Turkey',"Brazil")
total <- sample(1:100,30)
df <- cbind(age,country,total)
df<- as.data.frame(df)
df$total <- as.numeric(df$total)
df <- df %>%
dplyr::arrange(age) %>%
split( .[,"age"] ) %>%
purrr::map_df(., janitor::adorn_totals)
So then I get the totals for each age group
age country total
10-15 Brazil 51
10-15 Japan 28
10-15 Brazil 90
10-15 Japan 38
10-15 Brazil 48
10-15 Japan 52
10-15 Brazil 34
10-15 Japan 69
10-15 Brazil 66
10-15 Japan 57
Total - 533 <-- TOTAL
15-30 Japan 62
15-30 Turkey 26
15-30 Japan 17
15-30 Turkey 3
15-30 Japan 92
15-30 Turkey 87
15-30 Japan 86
15-30 Turkey 33
15-30 Japan 54
15-30 Turkey 45
Total - 505 <--TOTAL
30-45 Turkey 18
30-45 Brazil 75
30-45 Turkey 6
30-45 Brazil 42
30-45 Turkey 56
30-45 Brazil 43
30-45 Turkey 68
30-45 Brazil 25
30-45 Turkey 47
30-45 Brazil 39
Total - 419
But then I need those totals of each age group to make my second calculation (new column) but I've done it manually by typing the totals, I want to do this more intuitively but struggling to work out how to reference those columns
ifelse((age == '10-15'), (total/533* 100),
ifelse((age=='15-30'),(total/505 * 100),
ifelse((age=='30-45'),(total/419 * 100),NA))))
Output looks like this
10-15 Brazil 90 16.8855535
10-15 Japan 38 7.1294559
10-15 Brazil 48 9.0056285
10-15 Japan 52 9.7560976
10-15 Brazil 34 6.3789869
10-15 Japan 69 12.9455910
10-15 Brazil 66 12.3827392
10-15 Japan 57 10.6941839
Total - 533 NA
15-30 Japan 62 12.2772277
15-30 Turkey 26 5.1485149
15-30 Japan 17 3.3663366
15-30 Turkey 3 0.5940594
15-30 Japan 92 18.2178218
15-30 Turkey 87 17.2277228
15-30 Japan 86 17.0297030
15-30 Turkey 33 6.5346535
15-30 Japan 54 10.6930693
15-30 Turkey 45 8.9108911
Total - 505 NA
30-45 Turkey 18 4.2959427
30-45 Brazil 75 17.8997613
30-45 Turkey 6 1.4319809
30-45 Brazil 42 10.0238663
30-45 Turkey 56 13.3651551
30-45 Brazil 43 10.2625298
30-45 Turkey 68 16.2291169
30-45 Brazil 25 5.9665871
30-45 Turkey 47 11.2171838
30-45 Brazil 39 9.3078759
Total - 419 NA
Thanks so much!
CodePudding user response:
Just insert this line before your call to arrange()
group_by(age) %>% mutate(prop=total*100/sum(total)) %>% ungroup() %>%
Output:
age country total prop
10-15 Brazil 64 11.7647059
10-15 Japan 20 3.6764706
10-15 Brazil 58 10.6617647
10-15 Japan 36 6.6176471
10-15 Brazil 33 6.0661765
10-15 Japan 39 7.1691176
10-15 Brazil 87 15.9926471
10-15 Japan 41 7.5367647
10-15 Brazil 94 17.2794118
10-15 Japan 72 13.2352941
Total - 544 100.0000000
15-30 Japan 77 18.3770883
15-30 Turkey 40 9.5465394
15-30 Japan 61 14.5584726
15-30 Turkey 8 1.9093079
15-30 Japan 28 6.6825776
15-30 Turkey 93 22.1957041
15-30 Japan 45 10.7398568
15-30 Turkey 43 10.2625298
15-30 Japan 21 5.0119332
15-30 Turkey 3 0.7159905
Total - 419 100.0000000
30-45 Turkey 74 19.1214470
30-45 Brazil 37 9.5607235
30-45 Turkey 59 15.2454780
30-45 Brazil 30 7.7519380
30-45 Turkey 75 19.3798450
30-45 Brazil 4 1.0335917
30-45 Turkey 56 14.4702842
30-45 Brazil 17 4.3927649
30-45 Turkey 22 5.6847545
30-45 Brazil 13 3.3591731
Total - 387 100.0000000
A better option of course, is to not use adorn_totals()
at all:
df %>% group_by(age) %>% mutate(overall=sum(total), prop=total*100/overall)
Output:
# A tibble: 30 × 5
# Groups: age [3]
age country total overall prop
<chr> <chr> <dbl> <dbl> <dbl>
1 10-15 Brazil 64 544 11.8
2 15-30 Japan 77 419 18.4
3 30-45 Turkey 74 387 19.1
4 10-15 Japan 20 544 3.68
5 15-30 Turkey 40 419 9.55
6 30-45 Brazil 37 387 9.56
7 10-15 Brazil 58 544 10.7
8 15-30 Japan 61 419 14.6
9 30-45 Turkey 59 387 15.2
10 10-15 Japan 36 544 6.62
# … with 20 more rows
CodePudding user response:
This may also be done within group_modify
without having to redo the calculation again
library(dplyr)
library(janitor)
df <- df %>%
arrange(age) %>%
group_by(age) %>%
group_modify(~ .x %>%
adorn_totals() %>%
mutate(Perc = case_when(country != 'Total'
~ 100 *(total/last(total))) )) %>%
ungroup
-output
as.data.frame(df)
age country total Perc
1 10-15 Brazil 99 18.9655172
2 10-15 Japan 89 17.0498084
3 10-15 Brazil 49 9.3869732
4 10-15 Japan 83 15.9003831
5 10-15 Brazil 48 9.1954023
6 10-15 Japan 28 5.3639847
7 10-15 Brazil 47 9.0038314
8 10-15 Japan 19 3.6398467
9 10-15 Brazil 38 7.2796935
10 10-15 Japan 22 4.2145594
11 10-15 Total 522 NA
12 15-30 Japan 27 4.4481054
13 15-30 Turkey 7 1.1532125
14 15-30 Japan 91 14.9917628
15 15-30 Turkey 54 8.8962109
16 15-30 Japan 98 16.1449753
17 15-30 Turkey 30 4.9423394
18 15-30 Japan 59 9.7199341
19 15-30 Turkey 84 13.8385502
20 15-30 Japan 93 15.3212521
21 15-30 Turkey 64 10.5436573
22 15-30 Total 607 NA
23 30-45 Turkey 36 7.6433121
24 30-45 Brazil 1 0.2123142
25 30-45 Turkey 95 20.1698514
26 30-45 Brazil 21 4.4585987
27 30-45 Turkey 97 20.5944798
28 30-45 Brazil 92 19.5329087
29 30-45 Turkey 15 3.1847134
30 30-45 Brazil 35 7.4309979
31 30-45 Turkey 37 7.8556263
32 30-45 Brazil 42 8.9171975
33 30-45 Total 471 NA
data
df <- data.frame(age, country, total)