Home > Software engineering >  Count number of groups in grouped data that fulfill a condition
Count number of groups in grouped data that fulfill a condition

Time:03-24

I need to count how many Patients (my groups) fulfill a condition.

I have a large dataset and the last row always states yes or no (every patient has only yes or only no, but more of them), and now I need to know how many Patients are in the yes condition and how many patients are in the no condition.

I can only find results that count the conditions in a group, but not the groups by a condition.

The data looks like this:

structure(list(PATIENT.ID = c(210625L, 210625L, 210625L, 210625L, 
210625L, 210625L, 210625L, 210625L, 210625L, 210625L, 210625L, 
210625L, 210625L, 210625L, 210625L, 210625L, 210625L, 220909L, 
220909L, 220909L, 220909L, 220909L, 220909L, 220909L, 220909L, 
220909L, 220909L, 221179L, 221179L, 221179L, 221179L, 221179L, 
221179L, 221179L, 221179L, 221179L, 221179L, 221179L, 221179L, 
221179L, 221179L, 301705L, 301705L, 301705L, 301705L, 301705L, 
301705L, 301705L, 301705L, 301705L, 301705L, 301705L, 301705L, 
301705L, 301705L, 301705L, 303926L, 303926L, 303926L, 303926L
), Anycaffeina = c("Yes", "Yes", "Yes", "Yes", "Yes", "Yes", 
"Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", 
"Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", 
"Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", 
"Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", 
"Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", 
"Yes", "Yes", "Yes", "Yes", "Yes", "No", "No", "No", "No"
)), row.names = c(NA, -60L), class = c("tbl_df", "tbl", "data.frame"
))

And I want something like this: NO = N (here 1) and YES = N (here 4)

I now found a solution that worked with my dataset (much longer than the above and 81 columns, so maybe that's why @Yuriy Saraykin did not work with my original data?)

Anycaffeine[!duplicated(Anycaffeine$PATIENT.ID), ] 
count(z$Anycaffeina)

CodePudding user response:

tidyverse

library(tidyverse)

df %>% 
  distinct() %>% 
  count(Anycaffeina)

# A tibble: 2 x 2
  Anycaffeina     n
  <chr>       <int>
1 No              1
2 Yes             4

base

aggregate(.~Anycaffeina, data = unique(df), FUN = length)

  Anycaffeina PATIENT.ID
1          No          1
2         Yes          4

data.table

library(data.table)
library(magrittr)

setDT(df) %>% 
  unique() %>% 
  .[, .N, by = Anycaffeina] %>% 
  .[]

   Anycaffeina N
1:         Yes 4
2:          No 1

CodePudding user response:

z <- Anycaffeine[!duplicated(Anycaffeine$PATIENT.ID), ] count(z$Anycaffeina)

  •  Tags:  
  • r
  • Related