I getting face landmarks for each frame in a video. There are 477
landmarks, and each one is a (3,)
vector.
I have a 10 minute video at 30 fps. This means that I have 18000
arrays of shape (477,3)
. I want to store all this info in a pandas dataframe where each row is a frame and has 477 columns, one for each (3,) array.
Currently, I am doing this:
frame_lms = []
for frame in video:
landmark_dict = {}
lm_count = 0
for landmark in frame:
x = landmark.x
y = landmark.y
xy = np.array([x,y])
landmark_dict[f"lm_{count}"] = xy
lm_count =1
frame_lms.append(landmark_dict)
df = pd.DataFrame.from_dict(frame_lms)
df.to_csv('save.csv')
I got the idea to store everything in a list of dicts, append to a list, and then save from research showing that from_dict
is the fastest way to create a pandas df. However, this process is still slow because I have to hold frame_lms
in state, which gets huge as I append (477,3)
arrays into it.
What is the most computationally efficeint way to solve a problem like this?
CodePudding user response:
It is better to avoid creating and converting to numpy.array
many objects in the inner portion of nested a loop. Your code is much faster if you change xy = np.array([x, y])
in the inner loop to xy = (x, y)
. In the following code, I left the conversion to numpy.ndarray
out since I understand it is OK for the OP.
Since python is very efficient managing lists you can create a list of lists with the data, and asign the column names when creating the DataFrame
.
The faster, pythonic way of creating the list is
rv = [[(lm.x, lm.y) for lm in f] for f in video]
The it is equivalent to the following, slightly slower code (not recommended):
import numpy as np
# load video here
rv = []
for frame in video:
internal = []
for landmark in frame:
internal.append((landmark.x, landmark.y))
rv.append(internal)
You can create the DataFrame
from the lists using
df = pd.DataFrame(rv, columns=[f"lm_{count}" for count in range(477)])