Home > Software design >  How to use group_by() and summarize() to count the occurances of datapoints?
How to use group_by() and summarize() to count the occurances of datapoints?

Time:09-29

p <- data.frame(x = c("A", "B", "C", "A", "B"), 
                y = c("A", "B", "D", "A", "B"), 
                z = c("B", "C", "B", "D", "E"))
p

d <- p %>%  
  group_by(x) %>% 
  summarize(occurance1 = count(x),
            occurance2 = count(y),
            occurance3 = count(z),
            total = occurance1   occurance2   occurance3)
d

Output:

A tibble: 3 x 5

  x     occurance1 occurance2 occurance3 total

  <chr>      <int>      <int>      <int> <int>

1 A              2          2          1     5

2 B              2          2          1     5

3 C              1          1          1     3

I have a dataset similar to the one above where I'm trying to get the counts of the different factors in each column. The first one works perfectly, probably because it's grouped by (x), but I've run into various problems with the other two rows. As you can see, it doesn't count "D" at all in y, instead counting it as "C" and z doesn't have an "A" in it, but there's a count of 1 for A. Help?

CodePudding user response:

count needs data.frame/tibble as input and not a vector. To make this work, we may need to reshape to 'long' format with pivot_longer and apply the count on the columns, and then use adorn_totals to get the total column

library(dplyr)
library(tidyr)
library(janitor)
p %>% 
    pivot_longer(cols = everything()) %>% 
    count(name, value) %>% 
    pivot_wider(names_from = value, values_from = n, values_fill = 0) %>% 
    janitor::adorn_totals('col')

-output

  name A B C D E Total
    x 2 2 1 0 0     5
    y 2 2 0 1 0     5
    z 0 2 1 1 1     5

CodePudding user response:

In addition to akrun's solution here is one without janitor using select_if:

p %>% 
  pivot_longer(
    cols = everything(),
    names_to = "name",
    values_to = "values"
  ) %>% 
  count(name,values) %>% 
  pivot_wider(names_from = values, values_from = n, values_fill = 0) %>% 
  ungroup() %>% 
  mutate(Total = rowSums(select_if(., is.integer), na.rm = TRUE))
  name      A     B     C     D     E Total
  <chr> <int> <int> <int> <int> <int> <dbl>
1 x         2     2     1     0     0     5
2 y         2     2     0     1     0     5
3 z         0     2     1     1     1     5
  • Related