Home > OS >  Totals by group on R
Totals by group on R

Time:06-02

It's my first time posting here and I'm quite new so would appreciate any help!

I have a dataframe made up of 2 different classification groups which are Age and Country. Firstly I want to calculate how many people there are in each each group as I need those numbers to make my second calculation.

age <- c('10-15','15-30','30-45', '10-15','15-30','30-45')
country <- c('Brazil','Japan','Turkey','Japan','Turkey',"Brazil")
total <- sample(1:100,30)

df <- cbind(age,country,total)

df<- as.data.frame(df)
df$total <- as.numeric(df$total)

df <- df %>%
  dplyr::arrange(age) %>%
  split( .[,"age"] ) %>%
  purrr::map_df(., janitor::adorn_totals) 

So then I get the totals for each age group

   age country total
 10-15  Brazil    51
 10-15   Japan    28
 10-15  Brazil    90
 10-15   Japan    38
 10-15  Brazil    48
 10-15   Japan    52
 10-15  Brazil    34
 10-15   Japan    69
 10-15  Brazil    66
 10-15   Japan    57
 Total       -   533 <-- TOTAL
 15-30   Japan    62
 15-30  Turkey    26
 15-30   Japan    17
 15-30  Turkey     3
 15-30   Japan    92
 15-30  Turkey    87
 15-30   Japan    86
 15-30  Turkey    33
 15-30   Japan    54
 15-30  Turkey    45
 Total       -   505 <--TOTAL
 30-45  Turkey    18
 30-45  Brazil    75
 30-45  Turkey     6
 30-45  Brazil    42
 30-45  Turkey    56
 30-45  Brazil    43
 30-45  Turkey    68
 30-45  Brazil    25
 30-45  Turkey    47
 30-45  Brazil    39
 Total       -   419

But then I need those totals of each age group to make my second calculation (new column) but I've done it manually by typing the totals, I want to do this more intuitively but struggling to work out how to reference those columns

                 ifelse((age == '10-15'), (total/533* 100), 
                        ifelse((age=='15-30'),(total/505 * 100),
                        ifelse((age=='30-45'),(total/419 * 100),NA))))
Output looks like this

 10-15  Brazil    90 16.8855535
 10-15   Japan    38  7.1294559
 10-15  Brazil    48  9.0056285
 10-15   Japan    52  9.7560976
 10-15  Brazil    34  6.3789869
 10-15   Japan    69 12.9455910
 10-15  Brazil    66 12.3827392
 10-15   Japan    57 10.6941839
 Total       -   533         NA
 15-30   Japan    62 12.2772277
 15-30  Turkey    26  5.1485149
 15-30   Japan    17  3.3663366
 15-30  Turkey     3  0.5940594
 15-30   Japan    92 18.2178218
 15-30  Turkey    87 17.2277228
 15-30   Japan    86 17.0297030
 15-30  Turkey    33  6.5346535
 15-30   Japan    54 10.6930693
 15-30  Turkey    45  8.9108911
 Total       -   505         NA
 30-45  Turkey    18  4.2959427
 30-45  Brazil    75 17.8997613
 30-45  Turkey     6  1.4319809
 30-45  Brazil    42 10.0238663
 30-45  Turkey    56 13.3651551
 30-45  Brazil    43 10.2625298
 30-45  Turkey    68 16.2291169
 30-45  Brazil    25  5.9665871
 30-45  Turkey    47 11.2171838
 30-45  Brazil    39  9.3078759
 Total       -   419         NA

Thanks so much!

CodePudding user response:

Just insert this line before your call to arrange()

group_by(age) %>% mutate(prop=total*100/sum(total)) %>% ungroup() %>% 

Output:

   age country total        prop
 10-15  Brazil    64  11.7647059
 10-15   Japan    20   3.6764706
 10-15  Brazil    58  10.6617647
 10-15   Japan    36   6.6176471
 10-15  Brazil    33   6.0661765
 10-15   Japan    39   7.1691176
 10-15  Brazil    87  15.9926471
 10-15   Japan    41   7.5367647
 10-15  Brazil    94  17.2794118
 10-15   Japan    72  13.2352941
 Total       -   544 100.0000000
 15-30   Japan    77  18.3770883
 15-30  Turkey    40   9.5465394
 15-30   Japan    61  14.5584726
 15-30  Turkey     8   1.9093079
 15-30   Japan    28   6.6825776
 15-30  Turkey    93  22.1957041
 15-30   Japan    45  10.7398568
 15-30  Turkey    43  10.2625298
 15-30   Japan    21   5.0119332
 15-30  Turkey     3   0.7159905
 Total       -   419 100.0000000
 30-45  Turkey    74  19.1214470
 30-45  Brazil    37   9.5607235
 30-45  Turkey    59  15.2454780
 30-45  Brazil    30   7.7519380
 30-45  Turkey    75  19.3798450
 30-45  Brazil     4   1.0335917
 30-45  Turkey    56  14.4702842
 30-45  Brazil    17   4.3927649
 30-45  Turkey    22   5.6847545
 30-45  Brazil    13   3.3591731
 Total       -   387 100.0000000

A better option of course, is to not use adorn_totals() at all:

df %>% group_by(age) %>% mutate(overall=sum(total), prop=total*100/overall)

Output:

# A tibble: 30 × 5
# Groups:   age [3]
   age   country total overall  prop
   <chr> <chr>   <dbl>   <dbl> <dbl>
 1 10-15 Brazil     64     544 11.8 
 2 15-30 Japan      77     419 18.4 
 3 30-45 Turkey     74     387 19.1 
 4 10-15 Japan      20     544  3.68
 5 15-30 Turkey     40     419  9.55
 6 30-45 Brazil     37     387  9.56
 7 10-15 Brazil     58     544 10.7 
 8 15-30 Japan      61     419 14.6 
 9 30-45 Turkey     59     387 15.2 
10 10-15 Japan      36     544  6.62
# … with 20 more rows

CodePudding user response:

This may also be done within group_modify without having to redo the calculation again

library(dplyr)
library(janitor)
df <- df %>%
    arrange(age) %>%
    group_by(age) %>%
    group_modify(~ .x %>%
          adorn_totals() %>%
          mutate(Perc = case_when(country != 'Total'
       ~ 100 *(total/last(total))) ))  %>%
    ungroup

-output

as.data.frame(df)
  age country total       Perc
1  10-15  Brazil    99 18.9655172
2  10-15   Japan    89 17.0498084
3  10-15  Brazil    49  9.3869732
4  10-15   Japan    83 15.9003831
5  10-15  Brazil    48  9.1954023
6  10-15   Japan    28  5.3639847
7  10-15  Brazil    47  9.0038314
8  10-15   Japan    19  3.6398467
9  10-15  Brazil    38  7.2796935
10 10-15   Japan    22  4.2145594
11 10-15   Total   522         NA
12 15-30   Japan    27  4.4481054
13 15-30  Turkey     7  1.1532125
14 15-30   Japan    91 14.9917628
15 15-30  Turkey    54  8.8962109
16 15-30   Japan    98 16.1449753
17 15-30  Turkey    30  4.9423394
18 15-30   Japan    59  9.7199341
19 15-30  Turkey    84 13.8385502
20 15-30   Japan    93 15.3212521
21 15-30  Turkey    64 10.5436573
22 15-30   Total   607         NA
23 30-45  Turkey    36  7.6433121
24 30-45  Brazil     1  0.2123142
25 30-45  Turkey    95 20.1698514
26 30-45  Brazil    21  4.4585987
27 30-45  Turkey    97 20.5944798
28 30-45  Brazil    92 19.5329087
29 30-45  Turkey    15  3.1847134
30 30-45  Brazil    35  7.4309979
31 30-45  Turkey    37  7.8556263
32 30-45  Brazil    42  8.9171975
33 30-45   Total   471         NA

data

df <- data.frame(age, country, total)
  • Related