Home > OS >  R: Find the unique count (frequency) of a column by patient per treatment group
R: Find the unique count (frequency) of a column by patient per treatment group

Time:07-20

I have a dataset patient data. I want to get the frequency by col3 per unique col1 across col2. Col2 is a treatment group and need to derive the unique counts of type of injection present in col 1 per patient in COL3. Need help with R code to calculate the same.

CodePudding user response:

Quite unclear, but I guess something like this?

library(tidyverse)

df <- tibble(
  COL1 = paste("Type", sample(LETTERS[1:10], 100, replace = TRUE)),
  COL2 = paste("Injection", sample(1:10, 100, replace = TRUE)),
  COL3 = paste("Patient", sample(1:10, 100, replace = TRUE))
) %>% 
  relocate(COL3)

df %>% 
  count(COL3, COL1, COL2)

# A tibble: 98 × 4
   COL3      COL1   COL2            n
   <chr>     <chr>  <chr>       <int>
 1 Patient 1 Type B Injection 2     1
 2 Patient 1 Type B Injection 8     1
 3 Patient 1 Type C Injection 6     1
 4 Patient 1 Type C Injection 9     1
 5 Patient 1 Type D Injection 9     1
 6 Patient 1 Type E Injection 6     1
 7 Patient 1 Type F Injection 8     1
 8 Patient 1 Type G Injection 1     1
 9 Patient 1 Type G Injection 6     1
10 Patient 1 Type I Injection 1     1
# … with 88 more rows

CodePudding user response:

I'm not sure I completely follow what you're asking, but you should be able to adjust the following code to get what you need.

library(tidyverse)

data.df <- tibble(COL1=paste("Type", c("A", "B", "A", "A", "B", "C")),
                  COL2=paste("Injection", c(1, 1, 1, 2, 1, 1)),
                  COL3=paste("Patient", c(1, 1, 2, 2, 2, 2)))
data.df %>% 
group_by(COL3, COL1) %>% 
summarise(Num_col2=n_distinct(COL2))

The function n_distinct() gives the number of unique values in COL2 within each combination of the grouping variables, here COL3 and COL1. So, assuming that COL2 gives the number of injections, this is the number of injections of type A, B, C, etc for each patient. If instead you want the number of rows (i.e., total number of injections), you can use n().

  • Related