Background I am working on a Neural Network and I want to use the EMNIST (Extended MNIST) dataset. Of which the link is: https://www.kaggle.com/datasets/crawford/emnist
However my program is build on retrieving it’s dataset in a certain manner: {program’s dir.} > {dataset name} > {train or test} > {class_label Ex: 5} > {filename}.png
The Problem The EMNIST dataset comes in .CSV format. That files contains the following:
- Each row is a separate image
- 785 columns
- First column = class_label
- Each column after represents a one pixel value (28 x 28 so 784 columns)
I want to make every single row a PNG file in it’s own class_label folder. And every of the same class_label should go in the same file.
The problem is that I have no idea how to do this or where I should begin seeing that I have never worked with CSV files.
So I am trying to find somebody willing to help me do this in Python so I can go on working on my project!
I have been looking around the internet for a solution to do it row by row but I have yet to find a good solution.
CodePudding user response:
You can use PIL
to help you with convert the row of numerical value into image. I hope this code below help:
import csv
import os
from PIL import Image
# Open the CSV file and read the rows
with open('emnist.csv', 'r') as f:
reader = csv.reader(f)
rows = list(reader)
# Iterate through each row
for row in rows:
class_label = row[0] # class label
pixel_values = row[1:] # pixels
if not os.path.exists(class_label):
os.makedirs(class_label)
# Create a 28x28 image using the pixel values
img = Image.new('L', (28, 28))
img.putdata(pixel_values)
# Save the image to folder
img.save(f'{class_label}/{class_label}_{rows.index(row)}.png')