Home > Enterprise >  Count occurrences of two different characters with criteria and the total characters
Count occurrences of two different characters with criteria and the total characters

Time:08-27

I am trying to find classifications of A and B on the same day for the same name and count the total B classifications on the same day.

I included an example dataset and writeup below. The question I am trying to answer is "What percentage of B had an associated A?", which will also answer "What percentage of B did NOT have an associated A?"

On 2022-01-01 John Doe has A and B classification. Bruce Wayne also had a B classification, but no associated A. The output should show 1 instance of A and B happening together and 2 instances of B happening.

Date <- c("2022-01-01","2022-01-01","2022-01-01","2022-01-02","2022-01-02","2022-01-02", "2022-01-02")
Name <- c("John Doe","John Doe","Peter Parker","Bruce Wayne","Bruce Wayne","Lebron James", "Jane Doe")
Classification <- c("A","B","B", "B", "A", "B", "B")


df <- data.frame(Date,Name,Classification)
df

date_output <- c("2022-01-01", "2022-01-02")
b_and_a_output <- c(1,2)
daily_total_b_output <- c(1,3)
desired_output <- data.frame(date_output, b_and_a_output, daily_total_output)
desired_output

CodePudding user response:

There's probably a million ways to approach this, but here's the old crossprod trick for calculating co-occurrence of values:

library(dplyr)
df %>%
    group_by(Date) %>%
    summarise(
        tmp = list(crossprod(table(Name, Classification))),
        a_and_b = tmp[[1]]["A","B"],
        total_b = tmp[[1]]["B","B"]
    ) %>%
    select(-tmp)

## A tibble: 2 x 3
#  Date       a_and_b total_b
#  <chr>        <dbl>   <dbl>
#1 2022-01-01       1       2
#2 2022-01-02       1       3
  •  Tags:  
  • r
  • Related