Bhuvaneshvar
/

cmrit

Joblib

Model card Files Files and versions Community

cmrit / cmrithackathon-master /.venv /lib /python3.11 /site-packages /numpy /ma /README.rst

Bhuvaneshvar

Upload 2116 files

6370773 verified about 2 months ago

raw

history blame

9.87 kB

	==================================
	A guide to masked arrays in NumPy
	==================================

	.. Contents::

	See http://www.scipy.org/scipy/numpy/wiki/MaskedArray (dead link)
	for updates of this document.


	History
	-------

	As a regular user of MaskedArray, I (Pierre G.F. Gerard-Marchant) became
	increasingly frustrated with the subclassing of masked arrays (even if
	I can only blame my inexperience). I needed to develop a class of arrays
	that could store some additional information along with numerical values,
	while keeping the possibility for missing data (picture storing a series
	of dates along with measurements, what would later become the `TimeSeries
	Scikit <http://projects.scipy.org/scipy/scikits/wiki/TimeSeries>`__
	(dead link).

	I started to implement such a class, but then quickly realized that
	any additional information disappeared when processing these subarrays
	(for example, adding a constant value to a subarray would erase its
	dates). I ended up writing the equivalent of numpy.core.ma for my
	particular class, ufuncs included. Everything went fine until I needed to
	subclass my new class, when more problems showed up: some attributes of
	the new subclass were lost during processing. I identified the culprit as
	MaskedArray, which returns masked ndarrays when I expected masked
	arrays of my class. I was preparing myself to rewrite numpy.core.ma
	when I forced myself to learn how to subclass ndarrays. As I became more
	familiar with the __new__ and __array_finalize__ methods,
	I started to wonder why masked arrays were objects, and not ndarrays,
	and whether it wouldn't be more convenient for subclassing if they did
	behave like regular ndarrays.

	The new maskedarray is what I eventually come up with. The
	main differences with the initial numpy.core.ma package are
	that MaskedArray is now a subclass of ndarray and that the
	_data section can now be any subclass of ndarray. Apart from a
	couple of issues listed below, the behavior of the new MaskedArray
	class reproduces the old one. Initially the maskedarray
	implementation was marginally slower than numpy.ma in some areas,
	but work is underway to speed it up; the expectation is that it can be
	made substantially faster than the present numpy.ma.


	Note that if the subclass has some special methods and
	attributes, they are not propagated to the masked version:
	this would require a modification of the __getattribute__
	method (first trying ndarray.__getattribute__, then trying
	self._data.__getattribute__ if an exception is raised in the first
	place), which really slows things down.

	Main differences
	----------------

	* The _data part of the masked array can be any subclass of ndarray (but not recarray, cf below).
	* fill_value is now a property, not a function.
	* in the majority of cases, the mask is forced to nomask when no value is actually masked. A notable exception is when a masked array (with no masked values) has just been unpickled.
	* I got rid of the share_mask flag, I never understood its purpose.
	* put, putmask and take now mimic the ndarray methods, to avoid unpleasant surprises. Moreover, put and putmask both update the mask when needed. * if a is a masked array, bool(a) raises a ValueError, as it does with ndarrays.
	* in the same way, the comparison of two masked arrays is a masked array, not a boolean
	* filled(a) returns an array of the same subclass as a._data, and no test is performed on whether it is contiguous or not.
	* the mask is always printed, even if it's nomask, which makes things easy (for me at least) to remember that a masked array is used.
	* cumsum works as if the _data array was filled with 0. The mask is preserved, but not updated.
	* cumprod works as if the _data array was filled with 1. The mask is preserved, but not updated.

	New features
	------------

	This list is non-exhaustive...

	* the mr_ function mimics r_ for masked arrays.
	* the anom method returns the anomalies (deviations from the average)

	Using the new package with numpy.core.ma
	----------------------------------------

	I tried to make sure that the new package can understand old masked
	arrays. Unfortunately, there's no upward compatibility.

	For example:

	>>> import numpy.core.ma as old_ma
	>>> import maskedarray as new_ma
	>>> x = old_ma.array([1,2,3,4,5], mask=[0,0,1,0,0])
	>>> x
	array(data =
	[ 1 2 999999 4 5],
	mask =
	[False False True False False],
	fill_value=999999)
	>>> y = new_ma.array([1,2,3,4,5], mask=[0,0,1,0,0])
	>>> y
	array(data = [1 2 -- 4 5],
	mask = [False False True False False],
	fill_value=999999)
	>>> x==y
	array(data =
	[True True True True True],
	mask =
	[False False True False False],
	fill_value=?)
	>>> old_ma.getmask(x) == new_ma.getmask(x)
	array([True, True, True, True, True])
	>>> old_ma.getmask(y) == new_ma.getmask(y)
	array([True, True, False, True, True])
	>>> old_ma.getmask(y)
	False


	Using maskedarray with matplotlib
	---------------------------------

	Starting with matplotlib 0.91.2, the masked array importing will work with
	the maskedarray branch) as well as with earlier versions.

	By default matplotlib still uses numpy.ma, but there is an rcParams setting
	that you can use to select maskedarray instead. In the matplotlibrc file
	you will find::

	#maskedarray : False # True to use external maskedarray module
	# instead of numpy.ma; this is a temporary #
	setting for testing maskedarray.


	Uncomment and set to True to select maskedarray everywhere.
	Alternatively, you can test a script with maskedarray by using a
	command-line option, e.g.::

	python simple_plot.py --maskedarray


	Masked records
	--------------

	Like numpy.ma.core, the ndarray-based implementation
	of MaskedArray is limited when working with records: you can
	mask any record of the array, but not a field in a record. If you
	need this feature, you may want to give the mrecords package
	a try (available in the maskedarray directory in the scipy
	sandbox). This module defines a new class, MaskedRecord. An
	instance of this class accepts a recarray as data, and uses two
	masks: the fieldmask has as many entries as records in the array,
	each entry with the same fields as a record, but of boolean types:
	they indicate whether the field is masked or not; a record entry
	is flagged as masked in the mask array if all the fields are
	masked. A few examples in the file should give you an idea of what
	can be done. Note that mrecords is still experimental...

	Optimizing maskedarray
	----------------------

	Should masked arrays be filled before processing or not?
	--------------------------------------------------------

	In the current implementation, most operations on masked arrays involve
	the following steps:

	* the input arrays are filled
	* the operation is performed on the filled arrays
	* the mask is set for the results, from the combination of the input masks and the mask corresponding to the domain of the operation.

	For example, consider the division of two masked arrays::

	import numpy
	import maskedarray as ma
	x = ma.array([1,2,3,4],mask=[1,0,0,0], dtype=numpy.float64)
	y = ma.array([-1,0,1,2], mask=[0,0,0,1], dtype=numpy.float64)

	The division of x by y is then computed as::

	d1 = x.filled(0) # d1 = array([0., 2., 3., 4.])
	d2 = y.filled(1) # array([-1., 0., 1., 1.])
	m = ma.mask_or(ma.getmask(x), ma.getmask(y)) # m =
	array([True,False,False,True])
	dm = ma.divide.domain(d1,d2) # array([False, True, False, False])
	result = (d1/d2).view(MaskedArray) # masked_array([-0. inf, 3., 4.])
	result._mask = logical_or(m, dm)

	Note that a division by zero takes place. To avoid it, we can consider
	to fill the input arrays, taking the domain mask into account, so that::

	d1 = x._data.copy() # d1 = array([1., 2., 3., 4.])
	d2 = y._data.copy() # array([-1., 0., 1., 2.])
	dm = ma.divide.domain(d1,d2) # array([False, True, False, False])
	numpy.putmask(d2, dm, 1) # d2 = array([-1., 1., 1., 2.])
	m = ma.mask_or(ma.getmask(x), ma.getmask(y)) # m =
	array([True,False,False,True])
	result = (d1/d2).view(MaskedArray) # masked_array([-1. 0., 3., 2.])
	result._mask = logical_or(m, dm)

	Note that the .copy() is required to avoid updating the inputs with
	putmask. The .filled() method also involves a .copy().

	A third possibility consists in avoid filling the arrays::

	d1 = x._data # d1 = array([1., 2., 3., 4.])
	d2 = y._data # array([-1., 0., 1., 2.])
	dm = ma.divide.domain(d1,d2) # array([False, True, False, False])
	m = ma.mask_or(ma.getmask(x), ma.getmask(y)) # m =
	array([True,False,False,True])
	result = (d1/d2).view(MaskedArray) # masked_array([-1. inf, 3., 2.])
	result._mask = logical_or(m, dm)

	Note that here again the division by zero takes place.

	A quick benchmark gives the following results:

	* numpy.ma.divide : 2.69 ms per loop
	* classical division : 2.21 ms per loop
	* division w/ prefilling : 2.34 ms per loop
	* division w/o filling : 1.55 ms per loop

	So, is it worth filling the arrays beforehand ? Yes, if we are interested
	in avoiding floating-point exceptions that may fill the result with infs
	and nans. No, if we are only interested into speed...


	Thanks
	------

	I'd like to thank Paul Dubois, Travis Oliphant and Sasha for the
	original masked array package: without you, I would never have started
	that (it might be argued that I shouldn't have anyway, but that's
	another story...). I also wish to extend these thanks to Reggie Dugard
	and Eric Firing for their suggestions and numerous improvements.


	Revision notes
	--------------

	* 08/25/2007 : Creation of this page
	* 01/23/2007 : The package has been moved to the SciPy sandbox, and is regularly updated: please check out your SVN version!