I had a list of names (e.g: authors) and a pdf file include those names. My task is to calculate how many times those authors are mentioned in the pdf file.
Let's say my table of author is named "author" and my pdf file name is "pdf" (I converted and stored this pdf file in R already using pdf_text already) I tried as follow:
author$count <- 0
author$count <- for (i in author$name) { sum(str_count(pdf, i))}
But it didnt worked. When I printed author$count, the results are NULL. Is there anyway to fix this? Thank you all so much!
CodePudding user response:
We may need to assign within the loop. Also, loop across the sequence to do the assignment
for(i in seq_along(author$name)) {
author$count[i] <- sum(str_count(pdf, author$name[i]))
}
CodePudding user response:
Unlike most other functions, for
does not return a value in R, which unfortunately makes it much less useful. Instead, in most situations one of the vector mapping functions (lapply
, vapply
etc.) is more suitable to the task.
In your case, vapply
does the trick:
author$count <- vapply(author$name, \(i) sum(str_count(pdf, i)), integer(1L))
(If you’re using an older version of R, you need to replace \(i)
with function (i)
.)
Note that you do not need to assign 0
to author$count
beforehand. That value would be overwritten anyway.
A note on vapply
vs. sapply
vapply
ensures that the result of the function call actually conforms to the expected format (here: integer(1L)
, i.e. every element is a single integer). sapply
doesn’t do this, which makes using sapply
risky in non-interactive code, since it won’t notify you if there’s an error with the data. purrr::map_*
behaves similarly to vapply
.