My code looks like this:
import pandas as pd
import numpy as np
from skimage.io import imread
df = pd.DataFrame()
for i in range(1000):
try:
image = imread(f"Images/{i}.jpg")
featureMatrix = np.zeros((image.shape[0], image.shape[1]))
for j in range(0, image.shape[0]):
for k in range(0, image.shape[1]):
featureMatrix[j][k] = ((int(image[j, k, 0]) int(image[j, k, 1]) int(image[j, k, 2])) / 3)
features = pd.Series(np.reshape(featureMatrix, (image.shape[0] * image.shape[1])))
df[f"{i}"] = features
except:
pass
df.to_csv("Features.csv")
And when I run it I get a PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling frame.insert
many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()
when 'df[f"{i}"] = features' is run
I have tried using pd.concat but I am cannot get it to work. Any ideas on how I should replace the line?
CodePudding user response:
To improve performance and optimize processing avoid inserting a new Series into a dataframe on each of 1000 iterations.
Instead yield all series (with setting their name
) with a generator function and concat them at once with pd.concat
:
def collect_features():
for i in range(1000):
try:
image = imread(f"Images/{i}.jpg")
featureMatrix = np.zeros((image.shape[0], image.shape[1]))
for j in range(0, image.shape[0]):
for k in range(0, image.shape[1]):
featureMatrix[j][k] = ((int(image[j, k, 0]) int(image[j, k, 1]) int(image[j, k, 2])) / 3)
yield pd.Series(np.reshape(featureMatrix, (image.shape[0] * image.shape[1])), name=f"{i}")
except:
pass
pd.concat(list(collect_features()), axis=1).to_csv("Features.csv")
CodePudding user response:
what about this alternative approach ?
import pandas as pd
import numpy as np
from skimage.io import imread
df_list = []
for i in range(1000):
try:
image = imread(f"Images/{i}.jpg")
featureMatrix = np.zeros((image.shape[0], image.shape[1]))
for j in range(0, image.shape[0]):
for k in range(0, image.shape[1]):
featureMatrix[j][k] = ((int(image[j, k, 0]) int(image[j, k, 1]) int(image[j, k, 2])) / 3)
features = pd.Series(np.reshape(featureMatrix, (image.shape[0] * image.shape[1])))
df_list.append(features)
except:
pass
df = pd.concat(df_list, axis=1)
df.to_csv("Features.csv")