I have a dataset patient data. I want to get the frequency by col3 per unique col1 across col2. Col2 is a treatment group and need to derive the unique counts of type of injection present in col 1 per patient in COL3. Need help with R code to calculate the same.
CodePudding user response:
Quite unclear, but I guess something like this?
library(tidyverse)
df <- tibble(
COL1 = paste("Type", sample(LETTERS[1:10], 100, replace = TRUE)),
COL2 = paste("Injection", sample(1:10, 100, replace = TRUE)),
COL3 = paste("Patient", sample(1:10, 100, replace = TRUE))
) %>%
relocate(COL3)
df %>%
count(COL3, COL1, COL2)
# A tibble: 98 × 4
COL3 COL1 COL2 n
<chr> <chr> <chr> <int>
1 Patient 1 Type B Injection 2 1
2 Patient 1 Type B Injection 8 1
3 Patient 1 Type C Injection 6 1
4 Patient 1 Type C Injection 9 1
5 Patient 1 Type D Injection 9 1
6 Patient 1 Type E Injection 6 1
7 Patient 1 Type F Injection 8 1
8 Patient 1 Type G Injection 1 1
9 Patient 1 Type G Injection 6 1
10 Patient 1 Type I Injection 1 1
# … with 88 more rows
CodePudding user response:
I'm not sure I completely follow what you're asking, but you should be able to adjust the following code to get what you need.
library(tidyverse)
data.df <- tibble(COL1=paste("Type", c("A", "B", "A", "A", "B", "C")),
COL2=paste("Injection", c(1, 1, 1, 2, 1, 1)),
COL3=paste("Patient", c(1, 1, 2, 2, 2, 2)))
data.df %>%
group_by(COL3, COL1) %>%
summarise(Num_col2=n_distinct(COL2))
The function n_distinct()
gives the number of unique values in COL2
within each combination of the grouping variables, here COL3
and COL1
. So, assuming that COL2
gives the number of injections, this is the number of injections of type A, B, C, etc for each patient. If instead you want the number of rows (i.e., total number of injections), you can use n()
.