class MyClass():
def __init__(self, audio_file_path):
self.audio_file_path = audio_file_path
***other variables
def sliding_window_function(audio_file_path):
y, sr = librosa.load(audio_file_path)
timestamps = np.arange(0, len(y)) / sr
aby = np.abs(y)
del(y)
stamps_in_a_second = timestamps.shape[0]/librosa.get_duration(y=aby, sr=sr)
del(timestamps)
scan_window_size = int(self.wanted_window_length*stamps_in_a_second)
qth_amp = np.quantile(aby, self.wanted_quantile_threshold)
adj_qth_amp = qth_amp*scan_window_size
window_sum = sum(aby[:scan_window_size])
wanted_time_stamps = [[i, window_sum]
for i in range(len(aby) - scan_window_size)
if (window_sum := window_sum - aby[i] aby[scan_window_size i]) > adj_qth_amp ]
del(aby)
def process_audio():
wanted_time_stamps = sliding_window_function(self.audio_file_path)
for time_stamp in wanted_time_stamps:
#dosomething
def main(file_path):
myclass = MyClass(file_path)
myclass.process_audio()
if __name__ == "__main__":
pool = multiprocessing.Pool()
for file in file_path:
try:
pool.apply_async(main, args=(file,))
except:
pass
pool.close()
pool.join()
I have multiple audio file that needs to be processed. which
read the audio file with
librosa
(an audio library)do some
numpy
array computation with sliding windows to find regional highsuse result from
2
to modify the audio file, and video file(which I did not include in the code because it was not relavent to the issue I'm facing)
there are two reasons why I choose multiproccessing.Pool
:
- I believe that sliding window, and audio processing(mainly audio/video processing) is considered more of a cpu loaded work, which would benefit from multiprocessing.
Pool
allows me to limit workers working concurrently at once, which allows me to allocate my computation powers according to my needs.
My problem:
1.after initializing the Pool, and running the code, everything is running on for extremely long and all stuck on the sliding window function:
for i in range(len(aby) - scan_window_size)
if (window_sum := window_sum - aby[i] aby[scan_window_size i]) > adj_qth_amp ]
specifically
for i in range(len(aby) - scan_window_size)
if (window_sum := window_sum - aby[i] aby[scan_window_size i]) > adj_qth_amp
my cpu usage and power usage is still up, but every process in the pool seems to hang on
these two lines when I send a KeyboardInterupt
(its the same thing when using for loop
append
instead of list comprehension)
- Something strange I noticed is that my memory usage starts high, but after a few hours, they drop much lower than the theoretically should be.
aby
should be a array that is close to1GB
but each process is only using500MB
of RAM. but my code hasn't got todel(aby)
CodePudding user response:
I'd encourage you to use numpy intrinsics where possible, they're much faster than multiprocessing and will already be multithreaded where sensible.
For example:
import numpy as np
# 100M values
x = np.random.uniform(size=100_000_000)
windowed_sum = np.convolve(x, np.ones_like(x, shape=10), 'valid')
ix, = np.where(windowed_sum > 9)
will likely be >100 times faster than doing the work as you were doing by pulling individual values out of Numpy and into Python. I'd also be tempted to do the parallelism outside of Python where possible, it tends to make things easier to reason about and debug.
See Python Running cumulative sum with a given window for other ways of calculating a running sum efficiently.