Home > database >  How to find .jpg files, whose extensions are accidently chnaged to .png?
How to find .jpg files, whose extensions are accidently chnaged to .png?

Time:08-10

First of all, I am sorry because I do not know whether this question belongs here or not. I am using YOLOv5 for custom dataset, and I encountered with the error Memory Error Corrupt JPEG data: 2 extraneous bytes before marker 0xd9

The image shows the error.

1

The reason for this error might be because some of the png files might be changed to jpg while renaiming. This link says so.

Thus, is there a way to find the real file format of the images. I have lot of images, so check each images manually won't be the good idea. Thank you.

Or, you can help me with other ways, so that this error is removed and I could successfully train the dataset using YOLOv5.

CodePudding user response:

You can find images with a specific file extension using glob. i.e.

glob.glob('*.jpg')

or using a list comp:

jpg_files = [file for file in all_files_list if file.endswith('.jpg')]

with all_files_list being a list containing all file names.

alternatively:

jpg_files = list(filter(lambda x: x.endswith('.jpg'), os.listdir(r'/your/desired/directory')))

You could then use i.e. OpenCV to load the files and store them in the format you need.

CodePudding user response:

Most image formats have a "signature", or "magic number" at the start.

Check out PNG here and you'll see PNG images start with:

89 50 4e 47 0d 0a 1a 0a

and JPEG images start with:

ff d8 ff

So your best bet would be to test the first few bytes.


Note that you can also check the last few bytes to ensure an image is not corrupt or truncated.


If you don't have a hex editor on Windows, you can use Hexed.it online to get some experience of correct and broken images before you start coding.

  • Related