Parallel For Loop Over Numpy Matrix
I am looking at the joblib examples but I can't figure out how to do a parallel for loop over a matrix. I am computing a pairwise distance metric between the rows of a matrix. So I
Solution 1:
This can be done as follows using the multiprocessing
module:
import numpy as np
from fastdtw import fastdtw
import multiprocessing as mp
from scipy.spatial.distance import squareform, euclidean
from functools import partial
# Create simulated data matrix
data = np.random.random((33,300))
N, _ = data.shape
upper_triangle = [(i,j) for i inrange(N) for j inrange(i+1, N)]
with mp.Pool(processes=4) as pool:
result = pool.starmap(partial(fastdtw, dist=euclidean), [(data[i], data[j]) for (i,j) in upper_triangle])
dist_mat = squareform([item[0] for item in result])
Timing result using timeit
on an IvyBridge Core-i5:
24.052 secs
which is half the time without explicit parallelization.
ALSO:
As a future reference for anyone using the fastdtw
package. Importing the distance functions from scipy.spatial.distance
and calling fastdtw
as shown in the example on the link is much slower than just using: fastdtw(x,y,dist=2)
. The results seem similar to me and the execution time using pdist
(without resorting to parallelization) is under a second.
Post a Comment for "Parallel For Loop Over Numpy Matrix"