Chris@87: """ Chris@87: ============================= Chris@87: Subclassing ndarray in python Chris@87: ============================= Chris@87: Chris@87: Credits Chris@87: ------- Chris@87: Chris@87: This page is based with thanks on the wiki page on subclassing by Pierre Chris@87: Gerard-Marchant - http://www.scipy.org/Subclasses. Chris@87: Chris@87: Introduction Chris@87: ------------ Chris@87: Chris@87: Subclassing ndarray is relatively simple, but it has some complications Chris@87: compared to other Python objects. On this page we explain the machinery Chris@87: that allows you to subclass ndarray, and the implications for Chris@87: implementing a subclass. Chris@87: Chris@87: ndarrays and object creation Chris@87: ============================ Chris@87: Chris@87: Subclassing ndarray is complicated by the fact that new instances of Chris@87: ndarray classes can come about in three different ways. These are: Chris@87: Chris@87: #. Explicit constructor call - as in ``MySubClass(params)``. This is Chris@87: the usual route to Python instance creation. Chris@87: #. View casting - casting an existing ndarray as a given subclass Chris@87: #. New from template - creating a new instance from a template Chris@87: instance. Examples include returning slices from a subclassed array, Chris@87: creating return types from ufuncs, and copying arrays. See Chris@87: :ref:`new-from-template` for more details Chris@87: Chris@87: The last two are characteristics of ndarrays - in order to support Chris@87: things like array slicing. The complications of subclassing ndarray are Chris@87: due to the mechanisms numpy has to support these latter two routes of Chris@87: instance creation. Chris@87: Chris@87: .. _view-casting: Chris@87: Chris@87: View casting Chris@87: ------------ Chris@87: Chris@87: *View casting* is the standard ndarray mechanism by which you take an Chris@87: ndarray of any subclass, and return a view of the array as another Chris@87: (specified) subclass: Chris@87: Chris@87: >>> import numpy as np Chris@87: >>> # create a completely useless ndarray subclass Chris@87: >>> class C(np.ndarray): pass Chris@87: >>> # create a standard ndarray Chris@87: >>> arr = np.zeros((3,)) Chris@87: >>> # take a view of it, as our useless subclass Chris@87: >>> c_arr = arr.view(C) Chris@87: >>> type(c_arr) Chris@87: Chris@87: Chris@87: .. _new-from-template: Chris@87: Chris@87: Creating new from template Chris@87: -------------------------- Chris@87: Chris@87: New instances of an ndarray subclass can also come about by a very Chris@87: similar mechanism to :ref:`view-casting`, when numpy finds it needs to Chris@87: create a new instance from a template instance. The most obvious place Chris@87: this has to happen is when you are taking slices of subclassed arrays. Chris@87: For example: Chris@87: Chris@87: >>> v = c_arr[1:] Chris@87: >>> type(v) # the view is of type 'C' Chris@87: Chris@87: >>> v is c_arr # but it's a new instance Chris@87: False Chris@87: Chris@87: The slice is a *view* onto the original ``c_arr`` data. So, when we Chris@87: take a view from the ndarray, we return a new ndarray, of the same Chris@87: class, that points to the data in the original. Chris@87: Chris@87: There are other points in the use of ndarrays where we need such views, Chris@87: such as copying arrays (``c_arr.copy()``), creating ufunc output arrays Chris@87: (see also :ref:`array-wrap`), and reducing methods (like Chris@87: ``c_arr.mean()``. Chris@87: Chris@87: Relationship of view casting and new-from-template Chris@87: -------------------------------------------------- Chris@87: Chris@87: These paths both use the same machinery. We make the distinction here, Chris@87: because they result in different input to your methods. Specifically, Chris@87: :ref:`view-casting` means you have created a new instance of your array Chris@87: type from any potential subclass of ndarray. :ref:`new-from-template` Chris@87: means you have created a new instance of your class from a pre-existing Chris@87: instance, allowing you - for example - to copy across attributes that Chris@87: are particular to your subclass. Chris@87: Chris@87: Implications for subclassing Chris@87: ---------------------------- Chris@87: Chris@87: If we subclass ndarray, we need to deal not only with explicit Chris@87: construction of our array type, but also :ref:`view-casting` or Chris@87: :ref:`new-from-template`. Numpy has the machinery to do this, and this Chris@87: machinery that makes subclassing slightly non-standard. Chris@87: Chris@87: There are two aspects to the machinery that ndarray uses to support Chris@87: views and new-from-template in subclasses. Chris@87: Chris@87: The first is the use of the ``ndarray.__new__`` method for the main work Chris@87: of object initialization, rather then the more usual ``__init__`` Chris@87: method. The second is the use of the ``__array_finalize__`` method to Chris@87: allow subclasses to clean up after the creation of views and new Chris@87: instances from templates. Chris@87: Chris@87: A brief Python primer on ``__new__`` and ``__init__`` Chris@87: ===================================================== Chris@87: Chris@87: ``__new__`` is a standard Python method, and, if present, is called Chris@87: before ``__init__`` when we create a class instance. See the `python Chris@87: __new__ documentation Chris@87: `_ for more detail. Chris@87: Chris@87: For example, consider the following Python code: Chris@87: Chris@87: .. testcode:: Chris@87: Chris@87: class C(object): Chris@87: def __new__(cls, *args): Chris@87: print 'Cls in __new__:', cls Chris@87: print 'Args in __new__:', args Chris@87: return object.__new__(cls, *args) Chris@87: Chris@87: def __init__(self, *args): Chris@87: print 'type(self) in __init__:', type(self) Chris@87: print 'Args in __init__:', args Chris@87: Chris@87: meaning that we get: Chris@87: Chris@87: >>> c = C('hello') Chris@87: Cls in __new__: Chris@87: Args in __new__: ('hello',) Chris@87: type(self) in __init__: Chris@87: Args in __init__: ('hello',) Chris@87: Chris@87: When we call ``C('hello')``, the ``__new__`` method gets its own class Chris@87: as first argument, and the passed argument, which is the string Chris@87: ``'hello'``. After python calls ``__new__``, it usually (see below) Chris@87: calls our ``__init__`` method, with the output of ``__new__`` as the Chris@87: first argument (now a class instance), and the passed arguments Chris@87: following. Chris@87: Chris@87: As you can see, the object can be initialized in the ``__new__`` Chris@87: method or the ``__init__`` method, or both, and in fact ndarray does Chris@87: not have an ``__init__`` method, because all the initialization is Chris@87: done in the ``__new__`` method. Chris@87: Chris@87: Why use ``__new__`` rather than just the usual ``__init__``? Because Chris@87: in some cases, as for ndarray, we want to be able to return an object Chris@87: of some other class. Consider the following: Chris@87: Chris@87: .. testcode:: Chris@87: Chris@87: class D(C): Chris@87: def __new__(cls, *args): Chris@87: print 'D cls is:', cls Chris@87: print 'D args in __new__:', args Chris@87: return C.__new__(C, *args) Chris@87: Chris@87: def __init__(self, *args): Chris@87: # we never get here Chris@87: print 'In D __init__' Chris@87: Chris@87: meaning that: Chris@87: Chris@87: >>> obj = D('hello') Chris@87: D cls is: Chris@87: D args in __new__: ('hello',) Chris@87: Cls in __new__: Chris@87: Args in __new__: ('hello',) Chris@87: >>> type(obj) Chris@87: Chris@87: Chris@87: The definition of ``C`` is the same as before, but for ``D``, the Chris@87: ``__new__`` method returns an instance of class ``C`` rather than Chris@87: ``D``. Note that the ``__init__`` method of ``D`` does not get Chris@87: called. In general, when the ``__new__`` method returns an object of Chris@87: class other than the class in which it is defined, the ``__init__`` Chris@87: method of that class is not called. Chris@87: Chris@87: This is how subclasses of the ndarray class are able to return views Chris@87: that preserve the class type. When taking a view, the standard Chris@87: ndarray machinery creates the new ndarray object with something Chris@87: like:: Chris@87: Chris@87: obj = ndarray.__new__(subtype, shape, ... Chris@87: Chris@87: where ``subdtype`` is the subclass. Thus the returned view is of the Chris@87: same class as the subclass, rather than being of class ``ndarray``. Chris@87: Chris@87: That solves the problem of returning views of the same type, but now Chris@87: we have a new problem. The machinery of ndarray can set the class Chris@87: this way, in its standard methods for taking views, but the ndarray Chris@87: ``__new__`` method knows nothing of what we have done in our own Chris@87: ``__new__`` method in order to set attributes, and so on. (Aside - Chris@87: why not call ``obj = subdtype.__new__(...`` then? Because we may not Chris@87: have a ``__new__`` method with the same call signature). Chris@87: Chris@87: The role of ``__array_finalize__`` Chris@87: ================================== Chris@87: Chris@87: ``__array_finalize__`` is the mechanism that numpy provides to allow Chris@87: subclasses to handle the various ways that new instances get created. Chris@87: Chris@87: Remember that subclass instances can come about in these three ways: Chris@87: Chris@87: #. explicit constructor call (``obj = MySubClass(params)``). This will Chris@87: call the usual sequence of ``MySubClass.__new__`` then (if it exists) Chris@87: ``MySubClass.__init__``. Chris@87: #. :ref:`view-casting` Chris@87: #. :ref:`new-from-template` Chris@87: Chris@87: Our ``MySubClass.__new__`` method only gets called in the case of the Chris@87: explicit constructor call, so we can't rely on ``MySubClass.__new__`` or Chris@87: ``MySubClass.__init__`` to deal with the view casting and Chris@87: new-from-template. It turns out that ``MySubClass.__array_finalize__`` Chris@87: *does* get called for all three methods of object creation, so this is Chris@87: where our object creation housekeeping usually goes. Chris@87: Chris@87: * For the explicit constructor call, our subclass will need to create a Chris@87: new ndarray instance of its own class. In practice this means that Chris@87: we, the authors of the code, will need to make a call to Chris@87: ``ndarray.__new__(MySubClass,...)``, or do view casting of an existing Chris@87: array (see below) Chris@87: * For view casting and new-from-template, the equivalent of Chris@87: ``ndarray.__new__(MySubClass,...`` is called, at the C level. Chris@87: Chris@87: The arguments that ``__array_finalize__`` recieves differ for the three Chris@87: methods of instance creation above. Chris@87: Chris@87: The following code allows us to look at the call sequences and arguments: Chris@87: Chris@87: .. testcode:: Chris@87: Chris@87: import numpy as np Chris@87: Chris@87: class C(np.ndarray): Chris@87: def __new__(cls, *args, **kwargs): Chris@87: print 'In __new__ with class %s' % cls Chris@87: return np.ndarray.__new__(cls, *args, **kwargs) Chris@87: Chris@87: def __init__(self, *args, **kwargs): Chris@87: # in practice you probably will not need or want an __init__ Chris@87: # method for your subclass Chris@87: print 'In __init__ with class %s' % self.__class__ Chris@87: Chris@87: def __array_finalize__(self, obj): Chris@87: print 'In array_finalize:' Chris@87: print ' self type is %s' % type(self) Chris@87: print ' obj type is %s' % type(obj) Chris@87: Chris@87: Chris@87: Now: Chris@87: Chris@87: >>> # Explicit constructor Chris@87: >>> c = C((10,)) Chris@87: In __new__ with class Chris@87: In array_finalize: Chris@87: self type is Chris@87: obj type is Chris@87: In __init__ with class Chris@87: >>> # View casting Chris@87: >>> a = np.arange(10) Chris@87: >>> cast_a = a.view(C) Chris@87: In array_finalize: Chris@87: self type is Chris@87: obj type is Chris@87: >>> # Slicing (example of new-from-template) Chris@87: >>> cv = c[:1] Chris@87: In array_finalize: Chris@87: self type is Chris@87: obj type is Chris@87: Chris@87: The signature of ``__array_finalize__`` is:: Chris@87: Chris@87: def __array_finalize__(self, obj): Chris@87: Chris@87: ``ndarray.__new__`` passes ``__array_finalize__`` the new object, of our Chris@87: own class (``self``) as well as the object from which the view has been Chris@87: taken (``obj``). As you can see from the output above, the ``self`` is Chris@87: always a newly created instance of our subclass, and the type of ``obj`` Chris@87: differs for the three instance creation methods: Chris@87: Chris@87: * When called from the explicit constructor, ``obj`` is ``None`` Chris@87: * When called from view casting, ``obj`` can be an instance of any Chris@87: subclass of ndarray, including our own. Chris@87: * When called in new-from-template, ``obj`` is another instance of our Chris@87: own subclass, that we might use to update the new ``self`` instance. Chris@87: Chris@87: Because ``__array_finalize__`` is the only method that always sees new Chris@87: instances being created, it is the sensible place to fill in instance Chris@87: defaults for new object attributes, among other tasks. Chris@87: Chris@87: This may be clearer with an example. Chris@87: Chris@87: Simple example - adding an extra attribute to ndarray Chris@87: ----------------------------------------------------- Chris@87: Chris@87: .. testcode:: Chris@87: Chris@87: import numpy as np Chris@87: Chris@87: class InfoArray(np.ndarray): Chris@87: Chris@87: def __new__(subtype, shape, dtype=float, buffer=None, offset=0, Chris@87: strides=None, order=None, info=None): Chris@87: # Create the ndarray instance of our type, given the usual Chris@87: # ndarray input arguments. This will call the standard Chris@87: # ndarray constructor, but return an object of our type. Chris@87: # It also triggers a call to InfoArray.__array_finalize__ Chris@87: obj = np.ndarray.__new__(subtype, shape, dtype, buffer, offset, strides, Chris@87: order) Chris@87: # set the new 'info' attribute to the value passed Chris@87: obj.info = info Chris@87: # Finally, we must return the newly created object: Chris@87: return obj Chris@87: Chris@87: def __array_finalize__(self, obj): Chris@87: # ``self`` is a new object resulting from Chris@87: # ndarray.__new__(InfoArray, ...), therefore it only has Chris@87: # attributes that the ndarray.__new__ constructor gave it - Chris@87: # i.e. those of a standard ndarray. Chris@87: # Chris@87: # We could have got to the ndarray.__new__ call in 3 ways: Chris@87: # From an explicit constructor - e.g. InfoArray(): Chris@87: # obj is None Chris@87: # (we're in the middle of the InfoArray.__new__ Chris@87: # constructor, and self.info will be set when we return to Chris@87: # InfoArray.__new__) Chris@87: if obj is None: return Chris@87: # From view casting - e.g arr.view(InfoArray): Chris@87: # obj is arr Chris@87: # (type(obj) can be InfoArray) Chris@87: # From new-from-template - e.g infoarr[:3] Chris@87: # type(obj) is InfoArray Chris@87: # Chris@87: # Note that it is here, rather than in the __new__ method, Chris@87: # that we set the default value for 'info', because this Chris@87: # method sees all creation of default objects - with the Chris@87: # InfoArray.__new__ constructor, but also with Chris@87: # arr.view(InfoArray). Chris@87: self.info = getattr(obj, 'info', None) Chris@87: # We do not need to return anything Chris@87: Chris@87: Chris@87: Using the object looks like this: Chris@87: Chris@87: >>> obj = InfoArray(shape=(3,)) # explicit constructor Chris@87: >>> type(obj) Chris@87: Chris@87: >>> obj.info is None Chris@87: True Chris@87: >>> obj = InfoArray(shape=(3,), info='information') Chris@87: >>> obj.info Chris@87: 'information' Chris@87: >>> v = obj[1:] # new-from-template - here - slicing Chris@87: >>> type(v) Chris@87: Chris@87: >>> v.info Chris@87: 'information' Chris@87: >>> arr = np.arange(10) Chris@87: >>> cast_arr = arr.view(InfoArray) # view casting Chris@87: >>> type(cast_arr) Chris@87: Chris@87: >>> cast_arr.info is None Chris@87: True Chris@87: Chris@87: This class isn't very useful, because it has the same constructor as the Chris@87: bare ndarray object, including passing in buffers and shapes and so on. Chris@87: We would probably prefer the constructor to be able to take an already Chris@87: formed ndarray from the usual numpy calls to ``np.array`` and return an Chris@87: object. Chris@87: Chris@87: Slightly more realistic example - attribute added to existing array Chris@87: ------------------------------------------------------------------- Chris@87: Chris@87: Here is a class that takes a standard ndarray that already exists, casts Chris@87: as our type, and adds an extra attribute. Chris@87: Chris@87: .. testcode:: Chris@87: Chris@87: import numpy as np Chris@87: Chris@87: class RealisticInfoArray(np.ndarray): Chris@87: Chris@87: def __new__(cls, input_array, info=None): Chris@87: # Input array is an already formed ndarray instance Chris@87: # We first cast to be our class type Chris@87: obj = np.asarray(input_array).view(cls) Chris@87: # add the new attribute to the created instance Chris@87: obj.info = info Chris@87: # Finally, we must return the newly created object: Chris@87: return obj Chris@87: Chris@87: def __array_finalize__(self, obj): Chris@87: # see InfoArray.__array_finalize__ for comments Chris@87: if obj is None: return Chris@87: self.info = getattr(obj, 'info', None) Chris@87: Chris@87: Chris@87: So: Chris@87: Chris@87: >>> arr = np.arange(5) Chris@87: >>> obj = RealisticInfoArray(arr, info='information') Chris@87: >>> type(obj) Chris@87: Chris@87: >>> obj.info Chris@87: 'information' Chris@87: >>> v = obj[1:] Chris@87: >>> type(v) Chris@87: Chris@87: >>> v.info Chris@87: 'information' Chris@87: Chris@87: .. _array-wrap: Chris@87: Chris@87: ``__array_wrap__`` for ufuncs Chris@87: ------------------------------------------------------- Chris@87: Chris@87: ``__array_wrap__`` gets called at the end of numpy ufuncs and other numpy Chris@87: functions, to allow a subclass to set the type of the return value Chris@87: and update attributes and metadata. Let's show how this works with an example. Chris@87: First we make the same subclass as above, but with a different name and Chris@87: some print statements: Chris@87: Chris@87: .. testcode:: Chris@87: Chris@87: import numpy as np Chris@87: Chris@87: class MySubClass(np.ndarray): Chris@87: Chris@87: def __new__(cls, input_array, info=None): Chris@87: obj = np.asarray(input_array).view(cls) Chris@87: obj.info = info Chris@87: return obj Chris@87: Chris@87: def __array_finalize__(self, obj): Chris@87: print 'In __array_finalize__:' Chris@87: print ' self is %s' % repr(self) Chris@87: print ' obj is %s' % repr(obj) Chris@87: if obj is None: return Chris@87: self.info = getattr(obj, 'info', None) Chris@87: Chris@87: def __array_wrap__(self, out_arr, context=None): Chris@87: print 'In __array_wrap__:' Chris@87: print ' self is %s' % repr(self) Chris@87: print ' arr is %s' % repr(out_arr) Chris@87: # then just call the parent Chris@87: return np.ndarray.__array_wrap__(self, out_arr, context) Chris@87: Chris@87: We run a ufunc on an instance of our new array: Chris@87: Chris@87: >>> obj = MySubClass(np.arange(5), info='spam') Chris@87: In __array_finalize__: Chris@87: self is MySubClass([0, 1, 2, 3, 4]) Chris@87: obj is array([0, 1, 2, 3, 4]) Chris@87: >>> arr2 = np.arange(5)+1 Chris@87: >>> ret = np.add(arr2, obj) Chris@87: In __array_wrap__: Chris@87: self is MySubClass([0, 1, 2, 3, 4]) Chris@87: arr is array([1, 3, 5, 7, 9]) Chris@87: In __array_finalize__: Chris@87: self is MySubClass([1, 3, 5, 7, 9]) Chris@87: obj is MySubClass([0, 1, 2, 3, 4]) Chris@87: >>> ret Chris@87: MySubClass([1, 3, 5, 7, 9]) Chris@87: >>> ret.info Chris@87: 'spam' Chris@87: Chris@87: Note that the ufunc (``np.add``) has called the ``__array_wrap__`` method of the Chris@87: input with the highest ``__array_priority__`` value, in this case Chris@87: ``MySubClass.__array_wrap__``, with arguments ``self`` as ``obj``, and Chris@87: ``out_arr`` as the (ndarray) result of the addition. In turn, the Chris@87: default ``__array_wrap__`` (``ndarray.__array_wrap__``) has cast the Chris@87: result to class ``MySubClass``, and called ``__array_finalize__`` - Chris@87: hence the copying of the ``info`` attribute. This has all happened at the C level. Chris@87: Chris@87: But, we could do anything we wanted: Chris@87: Chris@87: .. testcode:: Chris@87: Chris@87: class SillySubClass(np.ndarray): Chris@87: Chris@87: def __array_wrap__(self, arr, context=None): Chris@87: return 'I lost your data' Chris@87: Chris@87: >>> arr1 = np.arange(5) Chris@87: >>> obj = arr1.view(SillySubClass) Chris@87: >>> arr2 = np.arange(5) Chris@87: >>> ret = np.multiply(obj, arr2) Chris@87: >>> ret Chris@87: 'I lost your data' Chris@87: Chris@87: So, by defining a specific ``__array_wrap__`` method for our subclass, Chris@87: we can tweak the output from ufuncs. The ``__array_wrap__`` method Chris@87: requires ``self``, then an argument - which is the result of the ufunc - Chris@87: and an optional parameter *context*. This parameter is returned by some Chris@87: ufuncs as a 3-element tuple: (name of the ufunc, argument of the ufunc, Chris@87: domain of the ufunc). ``__array_wrap__`` should return an instance of Chris@87: its containing class. See the masked array subclass for an Chris@87: implementation. Chris@87: Chris@87: In addition to ``__array_wrap__``, which is called on the way out of the Chris@87: ufunc, there is also an ``__array_prepare__`` method which is called on Chris@87: the way into the ufunc, after the output arrays are created but before any Chris@87: computation has been performed. The default implementation does nothing Chris@87: but pass through the array. ``__array_prepare__`` should not attempt to Chris@87: access the array data or resize the array, it is intended for setting the Chris@87: output array type, updating attributes and metadata, and performing any Chris@87: checks based on the input that may be desired before computation begins. Chris@87: Like ``__array_wrap__``, ``__array_prepare__`` must return an ndarray or Chris@87: subclass thereof or raise an error. Chris@87: Chris@87: Extra gotchas - custom ``__del__`` methods and ndarray.base Chris@87: ----------------------------------------------------------- Chris@87: Chris@87: One of the problems that ndarray solves is keeping track of memory Chris@87: ownership of ndarrays and their views. Consider the case where we have Chris@87: created an ndarray, ``arr`` and have taken a slice with ``v = arr[1:]``. Chris@87: The two objects are looking at the same memory. Numpy keeps track of Chris@87: where the data came from for a particular array or view, with the Chris@87: ``base`` attribute: Chris@87: Chris@87: >>> # A normal ndarray, that owns its own data Chris@87: >>> arr = np.zeros((4,)) Chris@87: >>> # In this case, base is None Chris@87: >>> arr.base is None Chris@87: True Chris@87: >>> # We take a view Chris@87: >>> v1 = arr[1:] Chris@87: >>> # base now points to the array that it derived from Chris@87: >>> v1.base is arr Chris@87: True Chris@87: >>> # Take a view of a view Chris@87: >>> v2 = v1[1:] Chris@87: >>> # base points to the view it derived from Chris@87: >>> v2.base is v1 Chris@87: True Chris@87: Chris@87: In general, if the array owns its own memory, as for ``arr`` in this Chris@87: case, then ``arr.base`` will be None - there are some exceptions to this Chris@87: - see the numpy book for more details. Chris@87: Chris@87: The ``base`` attribute is useful in being able to tell whether we have Chris@87: a view or the original array. This in turn can be useful if we need Chris@87: to know whether or not to do some specific cleanup when the subclassed Chris@87: array is deleted. For example, we may only want to do the cleanup if Chris@87: the original array is deleted, but not the views. For an example of Chris@87: how this can work, have a look at the ``memmap`` class in Chris@87: ``numpy.core``. Chris@87: Chris@87: Chris@87: """ Chris@87: from __future__ import division, absolute_import, print_function