I've read in two .jpg files using image_read
(from the magick
package) and then used cat
and image_ocr
to extract text from both and combine them into one return. This does return the text, but with quite a few errors that will need rectified using regular expressions. However, I can't seem to use the regular expressions and have it return the text again. This was originally showing the class of the vector as "NULL", so I added in the as.character
function to my preceding code hoping that would allow me to return the text after using regular expressions, but it only returns "character(0)" now. I have tried using lapply
to convert the vector to character, and tried using unlist(text)
to see if that helped, but got similar results. Here is what I have currently:
#read in image files
text.1 <- image_read("dthw02_Olympics_1.jpg")
text.2 <- image_read("dthw02_Olympics_2.jpg")
#return ocr date from both images combined
text <- as.character(cat(image_ocr(text.1), (image_ocr(text.2))))
class(text) #returns "character"
#regular expressions test to replace lower case with upper case
text <- gsub("paris", "PARIS", text)
text #returns character(0) and doesn't show the text
Here is what I get after the line
text <- as.character(cat(image_ocr(text.1), (image_ocr(text.2))))
I don't actually have to call text
it just returns this when the line is run.
» LA 2028 5 BEIWING 2008 2 SEOUL 1988 &B MEXICO 1968 shy LONDON 1948
a PARIS 2024 a ATHENS 2004 cs LOS ANGELES 1984 @ TOKYO 1964 Al BERLIN 1936
© =
‘TOKYO 2020 =| SYDNEY 2000 MOSCOW 1980 mS ROME 1960 oe LOS ANGELES 1932 & | 6 MELBOURNE/ g
cose RIO 2016 ATLANTA 1996 ae MONTREAL 1976 | STOGKHOLM‘656 AMSTERDAM 1928 D4 LONDON 2012 a BARCELONA 1992 sm MUNICH 1972 1 -| HELSINKI 1952 "paris 1924 8 ANTWERP 1920 g ATHENS 1896 2 STOCKHOLM 1912
8 LONDON 1908
ST. LOUIS 1904
4s PARIS 1900
My test of regular expressions is meant to just replace paris with PARIS, but then I'll have a lot of work to do with regular expressions after that.
CodePudding user response:
Solved in commments:
You need to use either c(image_ocr(text.1), (image_ocr(text.2)))
OR paste(image_ocr(text.1), (image_ocr(text.2)))
, depending on whether you want a vector or one long character string.
The cat(...)
function is Concatenate and Print and will not return your string. (See ?cat()
for details)