I have such data example.
dt=structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L), ae = c("increase in lymphocytes", "increase in lymphocytes",
"increase in abs. lymphocytes", "increase in lymphocytes", "decrease in abs. neutrophils",
"decrease in neutrophils", "decrease in abs. Monocytes", "decrease in monocytes",
"increase in lymphocytes", "decrease in hemoglobin", "decrease in neutrophils",
"decrease in abs. monocytes", "increase in lymphocytes"), link = c("Connected",
"Connected", "Connected", "Connected", "Connected", "Connected",
"Not connected", "Not connected", "Connected", "Not connected",
"Connected", "Not connected", "Connected")), class = "data.frame", row.names = c(NA,
-13L))
I need to calculate the percentage for two columns ae
and link
.
I try do so.
dt <- dt[,
.(n_gr1 = .SD[group == 1, .N],
n_gr2 = .SD[group == 2, .N],
size_gr1 = 19,
size_gr2 = 19),
by = c("ae","link")
]
and get not needed result
ae link n_gr1 n_gr2
1: increase in lymphocytes Connected 3 2
2: increase in abs. lymphocytes Connected 1 0
3: decrease in abs. neutrophils Connected 1 0
4: decrease in neutrophils Connected 1 1
5: decrease in abs. Monocytes Not connected 1 0
6: decrease in monocytes Not connected 1 0
7: decrease in hemoglobin Not connected 0 1
8: decrease in abs. monocytes Not connected 0 1
size_gr1 size_gr2
1: 19 19
2: 19 19
3: 19 19
4: 19 19
5: 19 19
6: 19 19
7: 19 19
8: 19 19
I need to calculate the percentage of the number of people in the group(size_gr1 and size_gr2) . For example like this(2 decimal places).
ae link n_gr1 n_gr2
1: increase in lymphocytes Connected 3(15,79%) 2(10,53%)
3/19*100=15,79%
2/19*100=10,53%
How can i get desired result. Thank you.
CodePudding user response:
I am not sure I exactly got what you want, but what about this:
dt[, perc_gp1:= round(n_gr1/size_gr1*100, 2)]
dt[, perc_gp2:= round(n_gr2/size_gr2*100, 2)]
Of course this approach would not scale up well, so let me know if you would need so
CodePudding user response:
I'm a bit confused what is exactly being asked but I believe if you are trying to get percentage of the category variables as a percentage of total then using the tidyvere and janitor packages you can do the below:
dt %>% count(group,ae) %>% #group by the grouping variables
mutate(ae_per=n/sum(n)) %>% #this takes the percentage of each ae category of the total (not subtotal)
janitor::adorn_totals() #adds a total at the bottom
This code produces the below output
group ae n ae_per
1 decrease in abs. Monocytes 1 0.07692308
1 decrease in abs. neutrophils 1 0.07692308
1 decrease in monocytes 1 0.07692308
1 decrease in neutrophils 1 0.07692308
1 increase in abs. lymphocytes 1 0.07692308
1 increase in lymphocytes 3 0.23076923
2 decrease in abs. monocytes 1 0.07692308
2 decrease in hemoglobin 1 0.07692308
2 decrease in neutrophils 1 0.07692308
2 increase in lymphocytes 2 0.15384615
Total - 13 1.00000000