How to calculate the percentage for two variables at once using R-CodePudding

I have such data example.

dt=structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L), ae = c("increase in lymphocytes", "increase in lymphocytes", 
"increase in abs. lymphocytes", "increase in lymphocytes", "decrease in abs. neutrophils", 
"decrease in neutrophils", "decrease in abs. Monocytes", "decrease in monocytes", 
"increase in lymphocytes", "decrease in hemoglobin", "decrease in neutrophils", 
"decrease in abs. monocytes", "increase in lymphocytes"), link = c("Connected", 
"Connected", "Connected", "Connected", "Connected", "Connected", 
"Not connected", "Not connected", "Connected", "Not connected", 
"Connected", "Not connected", "Connected")), class = "data.frame", row.names = c(NA, 
-13L))

I need to calculate the percentage for two columns ae and link. I try do so.

dt <- dt[, 
         .(n_gr1 = .SD[group == 1, .N],
           n_gr2 = .SD[group == 2, .N],
           size_gr1 = 19, 
           size_gr2 = 19), 
         by = c("ae","link")
]

and get not needed result

                             ae          link n_gr1 n_gr2
1:      increase in lymphocytes     Connected     3     2
2: increase in abs. lymphocytes     Connected     1     0
3: decrease in abs. neutrophils     Connected     1     0
4:      decrease in neutrophils     Connected     1     1
5:   decrease in abs. Monocytes Not connected     1     0
6:        decrease in monocytes Not connected     1     0
7:       decrease in hemoglobin Not connected     0     1
8:   decrease in abs. monocytes Not connected     0     1
   size_gr1 size_gr2
1:       19       19
2:       19       19
3:       19       19
4:       19       19
5:       19       19
6:       19       19
7:       19       19
8:       19       19

I need to calculate the percentage of the number of people in the group(size_gr1 and size_gr2) . For example like this(2 decimal places).

                             ae                link n_gr1             n_gr2
1:      increase in lymphocytes     Connected     3(15,79%)     2(10,53%)
3/19*100=15,79%
2/19*100=10,53%

How can i get desired result. Thank you.

CodePudding user response：

I am not sure I exactly got what you want, but what about this:

dt[, perc_gp1:= round(n_gr1/size_gr1*100, 2)]
dt[, perc_gp2:= round(n_gr2/size_gr2*100, 2)]

Of course this approach would not scale up well, so let me know if you would need so

CodePudding user response：

I'm a bit confused what is exactly being asked but I believe if you are trying to get percentage of the category variables as a percentage of total then using the tidyvere and janitor packages you can do the below:

dt %>% count(group,ae) %>% #group by the grouping variables
  mutate(ae_per=n/sum(n)) %>% #this takes the percentage of each ae category of the total (not subtotal)
  janitor::adorn_totals() #adds a total at the bottom

This code produces the below output

 group                           ae  n     ae_per
     1   decrease in abs. Monocytes  1 0.07692308
     1 decrease in abs. neutrophils  1 0.07692308
     1        decrease in monocytes  1 0.07692308
     1      decrease in neutrophils  1 0.07692308
     1 increase in abs. lymphocytes  1 0.07692308
     1      increase in lymphocytes  3 0.23076923
     2   decrease in abs. monocytes  1 0.07692308
     2       decrease in hemoglobin  1 0.07692308
     2      decrease in neutrophils  1 0.07692308
     2      increase in lymphocytes  2 0.15384615
 Total                            - 13 1.00000000