Skip to content Skip to sidebar Skip to footer

Numpy - Faster Operations On Masked Array?

I have a numpy array: import numpy as np arr = np.random.rand(100) If I want to find its maximum value, I run np.amax which runs 155,357 times a second on my machine. However, for

Solution 1:

MaskedArray is a subclass of the base numpy ndarray. It does not have compiled code of its own. Look at the numpy/ma/ directory for details, or the main file:

/usr/local/lib/python3.6/dist-packages/numpy/ma/core.py

A masked array has to key attributes, data and mask, one is the data array you used to create it, the other a boolean array of the same size.

So all operations have to take those two arrays into account. Not only does it calculate new data, it also has to calculate a new mask.

It can take several approaches (depending on the operation):

  • use the data as is

  • use compressed data - a new array with the masked values removed

  • use filled data, where the masked values are replaced by the fillvalue or some innocuous value (e.g. 0 when doing addition, 1 when doing multiplication).

The number of masked values, 0 or all, makes little, if any, difference is speed.

So the speed differences that you see are not surprising. There's a lot of extra calculation going on. The ma.core.py file says this package was first developed in pre-numpy days, and incorporated into numpy around 2005. While there have been changes to keep it up to date, I don't think it has been significantly reworked.

Here's the code for np.ma.max method:

defmax(self, axis=None, out=None, fill_value=None, keepdims=np._NoValue):

    kwargs = {} if keepdims is np._NoValue else {'keepdims': keepdims}

    _mask = self._mask
    newmask = _check_mask_axis(_mask, axis, **kwargs)
    if fill_value isNone:
        fill_value = maximum_fill_value(self)
    # No explicit outputif out isNone:
        result = self.filled(fill_value).max(
            axis=axis, out=out, **kwargs).view(type(self))
        if result.ndim:
            # Set the mask
            result.__setmask__(newmask)
            # Get rid of Infsif newmask.ndim:
                np.copyto(result, result.fill_value, where=newmask)
        elif newmask:
            result = masked
        return result
    # Explicit output
    ....

The key steps are

fill_value = maximum_fill_value(self)  # depends on dtypeself.filled(fill_value).max(
            axis=axis, out=out, **kwargs).view(type(self))

You can experiment with filled to see what happens with your array.

In [40]: arr = np.arange(10.)                                                                                        
In [41]: arr                                                                                                         
Out[41]: array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
In [42]: Marr = np.ma.masked_array(arr, mask=[0]*9+ [1])                                                            
In [43]: Marr                                                                                                        
Out[43]: 
masked_array(data=[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, --],
             mask=[False, False, False, False, False, False, False, False,
                   False,  True],
       fill_value=1e+20)
In [44]: np.ma.maximum_fill_value(Marr)                                                                              
Out[44]: -inf
In [45]: Marr.filled()                                                                                               
Out[45]: 
array([0.e+00, 1.e+00, 2.e+00, 3.e+00, 4.e+00, 5.e+00, 6.e+00, 7.e+00,
       8.e+00, 1.e+20])
In [46]: Marr.filled(_44)                                                                                            
Out[46]: array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8., -inf])
In [47]: arr.max()                                                                                                   
Out[47]: 9.0In [48]: Marr.max()                                                                                                  
Out[48]: 8.0

Post a Comment for "Numpy - Faster Operations On Masked Array?"