Home > Blockchain >  Getting metadata from JPG failing on some of my files using python?
Getting metadata from JPG failing on some of my files using python?

Time:12-06

I want to get the time a picture was taken and on most of my files it works. only on 6 of the 1800 I am not getting the date of the picture but when i look for the metta data with windows it does show the date the picture was taken.

If I run the code to get all meta data instead of only the date the picture was taken it give the error:

Traceback (most recent call last):
  File "G:\My Drive\school\programeren\fotos_orderer\main.py", line 43, in <module>
    get_date_of_JPG(path i)
  File "G:\My Drive\school\programeren\fotos_orderer\main.py", line 36, in get_date_of_JPG
    data = data.decode()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xea in position 1: invalid continuation byte

and when I run the code without decoding I get:

NIKON D3100
2152
b'\x1c\xea\x00\x00\x00\x08\x00\x00\ x00 reapets for 2052 times '
NIKON CORPORATION

But if i look with windows i find on the same file i get:

meta data of not working picture

Code i am using:

def get_date_of_JPG(path):
    image = Image.open(path)
    exifdata = image.getexif()
    for tag_id in exifdata:
        tag = TAGS.get(tag_id, tag_id)
        data = exifdata.get(tag_id)
        print(data)
    return "no Date"

Here is the meta data of a wroking picture:

37336
2
228
NIKON CORPORATION
NIKON D3100
Ver.1.01 
1
2022:10:19 11:43:18
2
300.0
300.0

Process finished with exit code 0

meta data of working picture

All fotos are made with the same camera, taken in the same month and stored on the same sd card.

Here is a link with the fotos that aren't working and one working foto:

https://drive.google.com/drive/folders/1ndeCCcbR6t-wiHYYOUqHfeo2-WaWyz1X?usp=sharing

Why am I not getting the correct data and how can i fix it?

CodePudding user response:

I don't know Python, so I can only assume that

  File "G:\My Drive\school\programeren\fotos_orderer\main.py", line 36, in get_date_of_JPG
    data = data.decode()

refers to this line (of which I don't know its line number):

        print(data)

because wanting to print a variable's content implies making text of it, and interpreting it as text may default to UTF-8 as encoding, which then leads to recognizing it is invalid as per standard. You cannot expect every Exif tag to have human readible content, and if it is text, you cannot expect it to be UTF-8.

If "your" code is a sloppy copy of How to Extract Image Metadata in Python and you didn't even posted your actual/complete code then

  1. figure out how to get ahold of each Exif tag's data type (because only ASCII should be treated as text and only numeric data types are kind of safe to be converted to text)
  2. tell Python to not treat that text in UTF-8 encoding
  3. use exception handling around text related code, so it doesn't crash the loop you're currently in.

In your case the file not working (2).JPG has the Exif tag 0xEA1C (or 59932) which is some propritary padding without further meaning in the Exif IFD (tag 0x8769 or 34665). Either only print tag content as per whitelist (f.e. only tags 0x9003 = 36867 and 0x9004 = 36868 if you're only interested in datetimes as text), or look at its data type to not output non-texts.


TL;DR: your code is too sloppy/optimistic. Not Python's and Pillow's fault.

  • Related