Skip to content Skip to sidebar Skip to footer

Parallel For Loop Over Numpy Matrix

I am looking at the joblib examples but I can't figure out how to do a parallel for loop over a matrix. I am computing a pairwise distance metric between the rows of a matrix. So I

Solution 1:

This can be done as follows using the multiprocessing module:

import numpy as np
from fastdtw import fastdtw
import multiprocessing as mp
from scipy.spatial.distance import squareform, euclidean
from functools import partial

# Create simulated data matrix
data = np.random.random((33,300))

N, _ = data.shape
upper_triangle = [(i,j) for i inrange(N) for j inrange(i+1, N)]

with mp.Pool(processes=4) as pool:
    result = pool.starmap(partial(fastdtw, dist=euclidean), [(data[i], data[j]) for (i,j) in upper_triangle])

dist_mat = squareform([item[0] for item in result])

Timing result using timeit on an IvyBridge Core-i5:

24.052 secs

which is half the time without explicit parallelization.

ALSO:

As a future reference for anyone using the fastdtw package. Importing the distance functions from scipy.spatial.distance and calling fastdtw as shown in the example on the link is much slower than just using: fastdtw(x,y,dist=2). The results seem similar to me and the execution time using pdist (without resorting to parallelization) is under a second.

Post a Comment for "Parallel For Loop Over Numpy Matrix"