Numpy - Faster Operations On Masked Array?
Solution 1:
MaskedArray
is a subclass of the base numpy ndarray
. It does not have compiled code of its own. Look at the numpy/ma/
directory for details, or the main file:
/usr/local/lib/python3.6/dist-packages/numpy/ma/core.py
A masked array has to key attributes, data
and mask
, one is the data array you used to create it, the other a boolean array of the same size.
So all operations have to take those two arrays into account. Not only does it calculate new data
, it also has to calculate a new mask
.
It can take several approaches (depending on the operation):
use the
data
as isuse compressed
data
- a new array with the masked values removeduse filled
data
, where the masked values are replaced by thefillvalue
or some innocuous value (e.g. 0 when doing addition, 1 when doing multiplication).
The number of masked values, 0 or all, makes little, if any, difference is speed.
So the speed differences that you see are not surprising. There's a lot of extra calculation going on. The ma.core.py
file says this package was first developed in pre-numpy days, and incorporated into numpy
around 2005. While there have been changes to keep it up to date, I don't think it has been significantly reworked.
Here's the code for np.ma.max
method:
defmax(self, axis=None, out=None, fill_value=None, keepdims=np._NoValue):
kwargs = {} if keepdims is np._NoValue else {'keepdims': keepdims}
_mask = self._mask
newmask = _check_mask_axis(_mask, axis, **kwargs)
if fill_value isNone:
fill_value = maximum_fill_value(self)
# No explicit outputif out isNone:
result = self.filled(fill_value).max(
axis=axis, out=out, **kwargs).view(type(self))
if result.ndim:
# Set the mask
result.__setmask__(newmask)
# Get rid of Infsif newmask.ndim:
np.copyto(result, result.fill_value, where=newmask)
elif newmask:
result = masked
return result
# Explicit output
....
The key steps are
fill_value = maximum_fill_value(self) # depends on dtypeself.filled(fill_value).max(
axis=axis, out=out, **kwargs).view(type(self))
You can experiment with filled
to see what happens with your array.
In [40]: arr = np.arange(10.)
In [41]: arr
Out[41]: array([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])
In [42]: Marr = np.ma.masked_array(arr, mask=[0]*9+ [1])
In [43]: Marr
Out[43]:
masked_array(data=[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, --],
mask=[False, False, False, False, False, False, False, False,
False, True],
fill_value=1e+20)
In [44]: np.ma.maximum_fill_value(Marr)
Out[44]: -inf
In [45]: Marr.filled()
Out[45]:
array([0.e+00, 1.e+00, 2.e+00, 3.e+00, 4.e+00, 5.e+00, 6.e+00, 7.e+00,
8.e+00, 1.e+20])
In [46]: Marr.filled(_44)
Out[46]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., -inf])
In [47]: arr.max()
Out[47]: 9.0In [48]: Marr.max()
Out[48]: 8.0
Post a Comment for "Numpy - Faster Operations On Masked Array?"