Is there a way to pass a PIL Image to google cloud vision?
I tried to use io.Bytes
, io.String
and Image.tobytes()
but I always get:
Traceback (most recent call last):
"C:\Users\...\vision_api.py", line 20, in get_text
image = vision.Image(content)
File "C:\...\venv\lib\site-packages\proto\message.py", line 494, in __init__
raise TypeError(
TypeError: Invalid constructor input for Image:b'Ma\x81Ma\x81La\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81La\x81Ma\x81Ma\x81Ma\x81Ma\x80Ma\x81La\x81Ma\x81Ma\x81Ma\x80Ma\x81Ma\x81Ma\x81Ma\x8 ...
or this if I pass the PIL-Image directly:
TypeError: Invalid constructor input for Image: <PIL.Image.Image image mode=RGB size=480x300 at 0x1D707131DC0>
This is my code:
image = Image.open(path).convert('RGB') # Opening the saved image
cropped_image = image.crop((30, 900, 510, 1200)) # Cropping the image
vision_image = vision.Image(# I passed the different options) # Here I need to pass the image, but I don't know how
client = vision.ImageAnnotatorClient()
response = client.text_detection(image=vision_image) # Text detection using google-vision-api
FOR CLARITY:
I want google text detection to only analyse a certain part of an image saved on my disk. So my idea was to crop the image using PIL and then pass the cropped image to google-vision. But it is not possible to pass an PIL-Image to vision.Image
, as I get the error above.
The documentation from Google.
This can be found in the vision.Image
class:
Attributes:
content (bytes):
Image content, represented as a stream of bytes. Note: As
with all ``bytes`` fields, protobuffers use a pure binary
representation, whereas JSON representations use base64.
Currently, this field only works for BatchAnnotateImages
requests. It does not work for AsyncBatchAnnotateImages
requests.
A working option is to save the PIL-Image as a PNG/JPG on my disk and load it using:
with io.open(file_name, 'rb') as image_file:
content = image_file.read()
vision_image = vision.Image(content=content)
But this is slow and seems unnecessary. And the whole point for me behind using google-vision-api is the speed comaped to open-cv.
CodePudding user response:
It would be good to have whole error stack and more accurate code snippet. But form presented information this seems to be confusion of two different "Images". Probably the some copy/paste error, as the tutorials have exactly the same line:
response = client.text_detection(image=image)
But mentioned tutorials image
is created by vision.Image()
so I think in presented code this should be:
response = client.text_detection(image=vision_image)
As, at least if I understand correctly the code snippet, image
is PIL Image, while vision_image
is Vision Image that should be passed to text_detection
method. So whatever is done in vision.Image()
does not have effect on the error massage.
CodePudding user response:
As far as I can tell, you start off with a PIL Image
and you want to obtain a PNG image in memory without going to disk. So you need this:
#!/usr/bin/env python3
from PIL import Image
from io import BytesIO
# Create PIL Image like you have - filled with red
im = Image.new('RGB', (320,240), (255,0,0))
# Create in-memory PNG - like you want for Google Cloud Vision
buffer = BytesIO()
im.save(buffer, format="PNG")
# Look at first few bytes
PNG = buffer.getvalue()
print(PNG[:20])
It prints this, which is exactly what you would get if you wrote the image to disk as a PNG and then read it back as binary - except this does it in memory without going to disk:
b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01@'