Home > Mobile >  How to search for words with asterisks and wildcards (e.g., exampl*) in R (word appearance in a data
How to search for words with asterisks and wildcards (e.g., exampl*) in R (word appearance in a data

Time:10-07

I wrote a code to count the appearance of words in a data frame:

Items <-  c('decid*','head', 'heads', 'decid*')
df1<-data.frame(Items)
words<- c('head', 'heads', 'decided', 'decides', 'top')
df_main<-data.frame(words)
item <- vector() 
count <- vector()
for (i in 1:length(unique(Items))){ 
item[i] <- Items[i] 
count[i]<- sum(df_main$words  == item[i])} 
word_freq <- data.frame(cbind(item, count))
word_freq

However, the results are like this:

item count
1 decid* 0
2 head 1
3 heads 1

As you see, it does not correctly count for "decid*". The actual results I expect should be like this:

item count
1 decid* 2
2 head 1
3 heads 1

I think I need to change the item word (decid*) format, however, I could not figure it out. Any help is much appreciated!

CodePudding user response:

I think you want to use decid* as regex pattern. == looks for an exact match, you may use grepl to look for a particular pattern.

I have used sapply as an alternative to for loop.

result <- stack(sapply(unique(df1$Items), function(x) {
  if(grepl('*', x, fixed = TRUE)) sum(grepl(x, df_main$word))
  else sum(x == df_main$words)
}))

result
# values    ind
#1      2 decid*
#2      1   head
#3      1  heads

CodePudding user response:

Perhaps as an alternative approach altogether: instead of creating a new dataframe word_freq, why not create a new column in df_main(if that's your "main" dataframe) which indicates the number of matches of your (apparently key)Items:

library(stringr)
df_main$count <- apply(df_main, 1, function(x) sum(str_count(x, Items)))

Result:

df_main
    words Items_count
1    head           1
2   heads           2
3 decided           2
4 decides           2
5     top           0
  • Related