my input:
df <- data.frame("Foo"=c("a","c","NG-c","d","e","f"), "Bar"=c("b","b","c","d","e","f"), "Baz" = c("a","a","c","NG-c","NG-c","d")
"Gaz" = c("NG-c","NG-c","NG-c", "NG-a","NG-a","NG-a"))
patern <- c("a","c")
A problem look a little bit complicated. I trying find&count&compare by pattern each column in dataframe. For example - I want find all matching NG-c
and output in which column the biggest percentage of NG-c
from total in each column.
That my code:
bg <- c()
for (i in ncol(df)) {
for (pt in length(patern)) {
tot <- sum(str_count(df[i],patern[pt]))
ng <- sum(str_count(df[i],paste0("NG-",patern[pt] )))
res <- round((ng/tot*100),1)
bg <- c(bg,res)
}
if (bg[pt] >= res) {
print(colnames(df[i]))
}
}
So I expect see Baz
and Gaz
column name, but I have some troubles.
First I get warning messages:
Error in if (bg[pt] >= res) { : missing value where TRUE/FALSE needed
And second:
Warning messages: 1: In stri_count_regex(string, pattern, opts_regex = opts(pattern)) : argument is not an atomic vector; coercing 2: In stri_count_regex(string, pattern, opts_regex = opts(pattern)) :
argument is not an atomic vector; coercing
Perhaps there is a better/clever way?
CodePudding user response:
Not sure if this is what your are looking for and how your question text is related to the patern
vector. If you want to count the occurences of NG-c
per column and calculate the percantage of NG-c
s per column, you could use
library(dplyr)
library(stringr)
df %>%
summarise(across(everything(),
~sum(str_count(.x, "NG-c"))/n()))
This returns
Foo Bar Baz Gaz
1 0.1666667 0 0.3333333 0.5
Data
df <- structure(list(Foo = c("a", "c", "NG-c", "d", "e", "f"), Bar = c("b",
"b", "c", "d", "e", "f"), Baz = c("a", "a", "c", "NG-c", "NG-c",
"d"), Gaz = c("NG-c", "NG-c", "NG-c", "NG-a", "NG-a", "NG-a")), class = "data.frame", row.names = c(NA,
-6L))