Home > Software design >  How do I count the number of appearances within vector of columns?
How do I count the number of appearances within vector of columns?

Time:12-13

enter image description here

enter image description here

How do I go from the first table to the second?

I do have vectors that I'm adhering to:

high_vector <- c("740", "742", "744")
all_vector <- c("736", "738")
  • Notice how 'high_vector' has an input, 744, that I don't use.

If this helps, I have some code from an earlier project in which I gather all inputs of a "Yes" within select variables. It differs from this question since I'm trying to ** add ** the presence of them:

PurposeCols <- c("NEW_CAR", "USED_CAR", "FURNITURE", "RADIO/TV", "EDUCATION", "RETRAINING")
CD$PURPOSE <- PurposeCols[apply(CD[PurposeCols],1, function(x) match("Yes", x))] %>% 
  replace_na("OTHER") %>% str_to_title() %>% as.factor()

In summary, I want to count the presence of any of the inputs from my vectors and then a separate column which counts the presence of those within only the second vector of mine.

I'm performing this on a much, much larger dataset but I plan on using group_by.

Thank You.

Data

foo <- data.frame(
  ID = c("one", "one", "one", "one", "two", "two"),
  first = c("736", "738", "997","200", "408", "675"),
  second = c("800", "842", "740", "301", "742", "682"),
  third = c("980",  NA,       NA, "742", "975", "738")
)

bar <- data.frame(
  all = c(4,2),
  high = c(2,1)
)
rownames(bar) <- c("one", "two")

CodePudding user response:

Reshape the data into long with pivot_longer, grouped by 'ID', summarise to get the count of those elements in value column with the combined vector of 'high_vector' and 'all_vector' with sum on a logical vector as well as the sum on 'high_vector' converted to logical as well

library(dplyr)
library(tidyr)
library(tibble)
foo %>%
   pivot_longer(cols = -ID) %>%
   group_by(ID) %>%
   summarise( all = sum(value %in% c(high_vector, all_vector)), 
     high = sum(value %in% high_vector)) %>%
   column_to_rownames('ID')

-output

    all high
one   4    2
two   2    1
  • Related