I want to write a function that gets the number of unigrams (one single word).
However, my current function does not work the way I want it to.
This is my function and example dataset:
library(ngrams)
library(tidyverse)
#dataframe
df<-tribble(~text,
"This sentence",
"I am going to luch",
"This is a really nice and sunny day")
#function
get_unigrams <- function(text) {
unigram<- ngram(text, n = 1) %>% get.ngrams() %>% length()
return(unigram)
}
However, the calculation with the "mutate" function gives me a very strange result:
df %>% mutate(n=get_unigrams((text)))
# A tibble: 3 x 2
text n
<chr> <int>
1 This sentence 14
2 I am going to luch 14
3 This is a really nice and sunny day 14
Each sentence length is equal.
I think this is because all three lines of text are put together and considered as one text.
But, I would like to have this result:
# A tibble: 3 x 2
text n
<chr> <int>
1 This sentence 2
2 I am going to luch 5
3 This is a really nice and sunny day 8
Can someone help me?
I do not see the error in my function.
Many thanks in advance!
Update:
I have found an (interim) solution:
get_unigrams <- function(text) {
sapply(text, function(text){
unigram<- ngram(text, n = 1) %>% get.ngrams() %>% length()
return(unigram)
}
)
}
However, the solution with the sapply
-function is very slow (because it executes each row individually). I have a dataframe with more than 100k rows.
Can someone help me increase the speed? For example with a vectorised function?
CodePudding user response:
Use rowwise
. Look into ?rowwise
for more info.
df %>% rowwise() %>%
mutate(n=get_unigrams(text))
text n
<chr> <int>
1 This sentence 2
2 I am going to luch 5
3 This is a really nice and sunny day 8
Another solution (using base R) is:
df$n <- apply(df, 1, get_unigrams)
CodePudding user response:
Another solution, based on stringr::str_count
:
library(tidyverse)
df<-tribble(~text,
"This sentence",
"I am going to luch",
"This is a really nice and sunny day")
df %>%
mutate(n = str_count(text, "\\w "))
#> # A tibble: 3 × 2
#> text n
#> <chr> <int>
#> 1 This sentence 2
#> 2 I am going to luch 5
#> 3 This is a really nice and sunny day 8