Pass PIL Image to google cloud vision without saving and reading-CodePudding

Is there a way to pass a PIL Image to google cloud vision?

I tried to use io.Bytes, io.String and Image.tobytes() but I always get:

Traceback (most recent call last):
  "C:\Users\...\vision_api.py", line 20, in get_text
    image = vision.Image(content)
  File "C:\...\venv\lib\site-packages\proto\message.py", line 494, in __init__
    raise TypeError(
TypeError: Invalid constructor input for Image:b'Ma\x81Ma\x81La\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81Ma\x81La\x81Ma\x81Ma\x81Ma\x81Ma\x80Ma\x81La\x81Ma\x81Ma\x81Ma\x80Ma\x81Ma\x81Ma\x81Ma\x8 ...

or this if I pass the PIL-Image directly:

TypeError: Invalid constructor input for Image: <PIL.Image.Image image mode=RGB size=480x300 at 0x1D707131DC0>

This is my code:

image = Image.open(path).convert('RGB')   # Opening the saved image
cropped_image = image.crop((30, 900, 510, 1200))   # Cropping the image

vision_image = vision.Image(# I passed the different options)   # Here I need to pass the image, but I don't know how
client = vision.ImageAnnotatorClient()
response = client.text_detection(image=vision_image)   # Text detection using google-vision-api

FOR CLARITY:

I want google text detection to only analyse a certain part of an image saved on my disk. So my idea was to crop the image using PIL and then pass the cropped image to google-vision. But it is not possible to pass an PIL-Image to vision.Image, as I get the error above.

The documentation from Google.

This can be found in the vision.Image class:

Attributes:
        content (bytes):
            Image content, represented as a stream of bytes. Note: As
            with all ``bytes`` fields, protobuffers use a pure binary
            representation, whereas JSON representations use base64.

            Currently, this field only works for BatchAnnotateImages
            requests. It does not work for AsyncBatchAnnotateImages
            requests.

A working option is to save the PIL-Image as a PNG/JPG on my disk and load it using:

with io.open(file_name, 'rb') as image_file:
    content = image_file.read()

vision_image = vision.Image(content=content)

But this is slow and seems unnecessary. And the whole point for me behind using google-vision-api is the speed comaped to open-cv.

CodePudding user response：

It would be good to have whole error stack and more accurate code snippet. But form presented information this seems to be confusion of two different "Images". Probably the some copy/paste error, as the tutorials have exactly the same line:

response = client.text_detection(image=image)

But mentioned tutorials image is created by vision.Image() so I think in presented code this should be:

response = client.text_detection(image=vision_image)

As, at least if I understand correctly the code snippet, image is PIL Image, while vision_image is Vision Image that should be passed to text_detection method. So whatever is done in vision.Image() does not have effect on the error massage.

CodePudding user response：

As far as I can tell, you start off with a PIL Image and you want to obtain a PNG image in memory without going to disk. So you need this:

#!/usr/bin/env python3

from PIL import Image
from io import BytesIO

# Create PIL Image like you have - filled with red
im = Image.new('RGB', (320,240), (255,0,0))

# Create in-memory PNG - like you want for Google Cloud Vision
buffer = BytesIO()
im.save(buffer, format="PNG")

# Look at first few bytes
PNG = buffer.getvalue()
print(PNG[:20])

It prints this, which is exactly what you would get if you wrote the image to disk as a PNG and then read it back as binary - except this does it in memory without going to disk:

b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01@'