Home > Software engineering >  How can I convert a vector created using image_ocr and cat to a character string that allows me to u
How can I convert a vector created using image_ocr and cat to a character string that allows me to u

Time:11-15

I've read in two .jpg files using image_read (from the magick package) and then used cat and image_ocr to extract text from both and combine them into one return. This does return the text, but with quite a few errors that will need rectified using regular expressions. However, I can't seem to use the regular expressions and have it return the text again. This was originally showing the class of the vector as "NULL", so I added in the as.character function to my preceding code hoping that would allow me to return the text after using regular expressions, but it only returns "character(0)" now. I have tried using lapply to convert the vector to character, and tried using unlist(text) to see if that helped, but got similar results. Here is what I have currently:

#read in image files
text.1 <- image_read("dthw02_Olympics_1.jpg")
text.2 <- image_read("dthw02_Olympics_2.jpg")

#return ocr date from both images combined
text <- as.character(cat(image_ocr(text.1), (image_ocr(text.2))))
class(text) #returns "character"

#regular expressions test to replace lower case with upper case

text <- gsub("paris", "PARIS", text)
text #returns character(0) and doesn't show the text

Here is what I get after the line

text <- as.character(cat(image_ocr(text.1), (image_ocr(text.2))))

I don't actually have to call text it just returns this when the line is run.

» LA 2028 5 BEIWING 2008 2 SEOUL 1988 &B MEXICO 1968 shy LONDON 1948

a PARIS 2024 a ATHENS 2004 cs LOS ANGELES 1984 @ TOKYO 1964 Al BERLIN 1936

© =

‘TOKYO 2020 =| SYDNEY 2000 MOSCOW 1980 mS ROME 1960 oe LOS ANGELES 1932 & | 6 MELBOURNE/ g

cose RIO 2016 ATLANTA 1996 ae MONTREAL 1976 | STOGKHOLM‘656 AMSTERDAM 1928 D4 LONDON 2012 a BARCELONA 1992 sm MUNICH 1972 1 -| HELSINKI 1952 "paris 1924 8 ANTWERP 1920 g ATHENS 1896 2 STOCKHOLM 1912

8 LONDON 1908

ST. LOUIS 1904

4s PARIS 1900

My test of regular expressions is meant to just replace paris with PARIS, but then I'll have a lot of work to do with regular expressions after that.

CodePudding user response:

Solved in commments:

You need to use either c(image_ocr(text.1), (image_ocr(text.2))) OR paste(image_ocr(text.1), (image_ocr(text.2))), depending on whether you want a vector or one long character string.

The cat(...) function is Concatenate and Print and will not return your string. (See ?cat() for details)

  • Related