comparison DEPENDENCIES/mingw32/Python27/Lib/site-packages/numpy/doc/subclassing.py @ 87:2a2c65a20a8b

Add Python libs and headers
author Chris Cannam
date Wed, 25 Feb 2015 14:05:22 +0000
parents
children
comparison
equal deleted inserted replaced
86:413a9d26189e 87:2a2c65a20a8b
1 """
2 =============================
3 Subclassing ndarray in python
4 =============================
5
6 Credits
7 -------
8
9 This page is based with thanks on the wiki page on subclassing by Pierre
10 Gerard-Marchant - http://www.scipy.org/Subclasses.
11
12 Introduction
13 ------------
14
15 Subclassing ndarray is relatively simple, but it has some complications
16 compared to other Python objects. On this page we explain the machinery
17 that allows you to subclass ndarray, and the implications for
18 implementing a subclass.
19
20 ndarrays and object creation
21 ============================
22
23 Subclassing ndarray is complicated by the fact that new instances of
24 ndarray classes can come about in three different ways. These are:
25
26 #. Explicit constructor call - as in ``MySubClass(params)``. This is
27 the usual route to Python instance creation.
28 #. View casting - casting an existing ndarray as a given subclass
29 #. New from template - creating a new instance from a template
30 instance. Examples include returning slices from a subclassed array,
31 creating return types from ufuncs, and copying arrays. See
32 :ref:`new-from-template` for more details
33
34 The last two are characteristics of ndarrays - in order to support
35 things like array slicing. The complications of subclassing ndarray are
36 due to the mechanisms numpy has to support these latter two routes of
37 instance creation.
38
39 .. _view-casting:
40
41 View casting
42 ------------
43
44 *View casting* is the standard ndarray mechanism by which you take an
45 ndarray of any subclass, and return a view of the array as another
46 (specified) subclass:
47
48 >>> import numpy as np
49 >>> # create a completely useless ndarray subclass
50 >>> class C(np.ndarray): pass
51 >>> # create a standard ndarray
52 >>> arr = np.zeros((3,))
53 >>> # take a view of it, as our useless subclass
54 >>> c_arr = arr.view(C)
55 >>> type(c_arr)
56 <class 'C'>
57
58 .. _new-from-template:
59
60 Creating new from template
61 --------------------------
62
63 New instances of an ndarray subclass can also come about by a very
64 similar mechanism to :ref:`view-casting`, when numpy finds it needs to
65 create a new instance from a template instance. The most obvious place
66 this has to happen is when you are taking slices of subclassed arrays.
67 For example:
68
69 >>> v = c_arr[1:]
70 >>> type(v) # the view is of type 'C'
71 <class 'C'>
72 >>> v is c_arr # but it's a new instance
73 False
74
75 The slice is a *view* onto the original ``c_arr`` data. So, when we
76 take a view from the ndarray, we return a new ndarray, of the same
77 class, that points to the data in the original.
78
79 There are other points in the use of ndarrays where we need such views,
80 such as copying arrays (``c_arr.copy()``), creating ufunc output arrays
81 (see also :ref:`array-wrap`), and reducing methods (like
82 ``c_arr.mean()``.
83
84 Relationship of view casting and new-from-template
85 --------------------------------------------------
86
87 These paths both use the same machinery. We make the distinction here,
88 because they result in different input to your methods. Specifically,
89 :ref:`view-casting` means you have created a new instance of your array
90 type from any potential subclass of ndarray. :ref:`new-from-template`
91 means you have created a new instance of your class from a pre-existing
92 instance, allowing you - for example - to copy across attributes that
93 are particular to your subclass.
94
95 Implications for subclassing
96 ----------------------------
97
98 If we subclass ndarray, we need to deal not only with explicit
99 construction of our array type, but also :ref:`view-casting` or
100 :ref:`new-from-template`. Numpy has the machinery to do this, and this
101 machinery that makes subclassing slightly non-standard.
102
103 There are two aspects to the machinery that ndarray uses to support
104 views and new-from-template in subclasses.
105
106 The first is the use of the ``ndarray.__new__`` method for the main work
107 of object initialization, rather then the more usual ``__init__``
108 method. The second is the use of the ``__array_finalize__`` method to
109 allow subclasses to clean up after the creation of views and new
110 instances from templates.
111
112 A brief Python primer on ``__new__`` and ``__init__``
113 =====================================================
114
115 ``__new__`` is a standard Python method, and, if present, is called
116 before ``__init__`` when we create a class instance. See the `python
117 __new__ documentation
118 <http://docs.python.org/reference/datamodel.html#object.__new__>`_ for more detail.
119
120 For example, consider the following Python code:
121
122 .. testcode::
123
124 class C(object):
125 def __new__(cls, *args):
126 print 'Cls in __new__:', cls
127 print 'Args in __new__:', args
128 return object.__new__(cls, *args)
129
130 def __init__(self, *args):
131 print 'type(self) in __init__:', type(self)
132 print 'Args in __init__:', args
133
134 meaning that we get:
135
136 >>> c = C('hello')
137 Cls in __new__: <class 'C'>
138 Args in __new__: ('hello',)
139 type(self) in __init__: <class 'C'>
140 Args in __init__: ('hello',)
141
142 When we call ``C('hello')``, the ``__new__`` method gets its own class
143 as first argument, and the passed argument, which is the string
144 ``'hello'``. After python calls ``__new__``, it usually (see below)
145 calls our ``__init__`` method, with the output of ``__new__`` as the
146 first argument (now a class instance), and the passed arguments
147 following.
148
149 As you can see, the object can be initialized in the ``__new__``
150 method or the ``__init__`` method, or both, and in fact ndarray does
151 not have an ``__init__`` method, because all the initialization is
152 done in the ``__new__`` method.
153
154 Why use ``__new__`` rather than just the usual ``__init__``? Because
155 in some cases, as for ndarray, we want to be able to return an object
156 of some other class. Consider the following:
157
158 .. testcode::
159
160 class D(C):
161 def __new__(cls, *args):
162 print 'D cls is:', cls
163 print 'D args in __new__:', args
164 return C.__new__(C, *args)
165
166 def __init__(self, *args):
167 # we never get here
168 print 'In D __init__'
169
170 meaning that:
171
172 >>> obj = D('hello')
173 D cls is: <class 'D'>
174 D args in __new__: ('hello',)
175 Cls in __new__: <class 'C'>
176 Args in __new__: ('hello',)
177 >>> type(obj)
178 <class 'C'>
179
180 The definition of ``C`` is the same as before, but for ``D``, the
181 ``__new__`` method returns an instance of class ``C`` rather than
182 ``D``. Note that the ``__init__`` method of ``D`` does not get
183 called. In general, when the ``__new__`` method returns an object of
184 class other than the class in which it is defined, the ``__init__``
185 method of that class is not called.
186
187 This is how subclasses of the ndarray class are able to return views
188 that preserve the class type. When taking a view, the standard
189 ndarray machinery creates the new ndarray object with something
190 like::
191
192 obj = ndarray.__new__(subtype, shape, ...
193
194 where ``subdtype`` is the subclass. Thus the returned view is of the
195 same class as the subclass, rather than being of class ``ndarray``.
196
197 That solves the problem of returning views of the same type, but now
198 we have a new problem. The machinery of ndarray can set the class
199 this way, in its standard methods for taking views, but the ndarray
200 ``__new__`` method knows nothing of what we have done in our own
201 ``__new__`` method in order to set attributes, and so on. (Aside -
202 why not call ``obj = subdtype.__new__(...`` then? Because we may not
203 have a ``__new__`` method with the same call signature).
204
205 The role of ``__array_finalize__``
206 ==================================
207
208 ``__array_finalize__`` is the mechanism that numpy provides to allow
209 subclasses to handle the various ways that new instances get created.
210
211 Remember that subclass instances can come about in these three ways:
212
213 #. explicit constructor call (``obj = MySubClass(params)``). This will
214 call the usual sequence of ``MySubClass.__new__`` then (if it exists)
215 ``MySubClass.__init__``.
216 #. :ref:`view-casting`
217 #. :ref:`new-from-template`
218
219 Our ``MySubClass.__new__`` method only gets called in the case of the
220 explicit constructor call, so we can't rely on ``MySubClass.__new__`` or
221 ``MySubClass.__init__`` to deal with the view casting and
222 new-from-template. It turns out that ``MySubClass.__array_finalize__``
223 *does* get called for all three methods of object creation, so this is
224 where our object creation housekeeping usually goes.
225
226 * For the explicit constructor call, our subclass will need to create a
227 new ndarray instance of its own class. In practice this means that
228 we, the authors of the code, will need to make a call to
229 ``ndarray.__new__(MySubClass,...)``, or do view casting of an existing
230 array (see below)
231 * For view casting and new-from-template, the equivalent of
232 ``ndarray.__new__(MySubClass,...`` is called, at the C level.
233
234 The arguments that ``__array_finalize__`` recieves differ for the three
235 methods of instance creation above.
236
237 The following code allows us to look at the call sequences and arguments:
238
239 .. testcode::
240
241 import numpy as np
242
243 class C(np.ndarray):
244 def __new__(cls, *args, **kwargs):
245 print 'In __new__ with class %s' % cls
246 return np.ndarray.__new__(cls, *args, **kwargs)
247
248 def __init__(self, *args, **kwargs):
249 # in practice you probably will not need or want an __init__
250 # method for your subclass
251 print 'In __init__ with class %s' % self.__class__
252
253 def __array_finalize__(self, obj):
254 print 'In array_finalize:'
255 print ' self type is %s' % type(self)
256 print ' obj type is %s' % type(obj)
257
258
259 Now:
260
261 >>> # Explicit constructor
262 >>> c = C((10,))
263 In __new__ with class <class 'C'>
264 In array_finalize:
265 self type is <class 'C'>
266 obj type is <type 'NoneType'>
267 In __init__ with class <class 'C'>
268 >>> # View casting
269 >>> a = np.arange(10)
270 >>> cast_a = a.view(C)
271 In array_finalize:
272 self type is <class 'C'>
273 obj type is <type 'numpy.ndarray'>
274 >>> # Slicing (example of new-from-template)
275 >>> cv = c[:1]
276 In array_finalize:
277 self type is <class 'C'>
278 obj type is <class 'C'>
279
280 The signature of ``__array_finalize__`` is::
281
282 def __array_finalize__(self, obj):
283
284 ``ndarray.__new__`` passes ``__array_finalize__`` the new object, of our
285 own class (``self``) as well as the object from which the view has been
286 taken (``obj``). As you can see from the output above, the ``self`` is
287 always a newly created instance of our subclass, and the type of ``obj``
288 differs for the three instance creation methods:
289
290 * When called from the explicit constructor, ``obj`` is ``None``
291 * When called from view casting, ``obj`` can be an instance of any
292 subclass of ndarray, including our own.
293 * When called in new-from-template, ``obj`` is another instance of our
294 own subclass, that we might use to update the new ``self`` instance.
295
296 Because ``__array_finalize__`` is the only method that always sees new
297 instances being created, it is the sensible place to fill in instance
298 defaults for new object attributes, among other tasks.
299
300 This may be clearer with an example.
301
302 Simple example - adding an extra attribute to ndarray
303 -----------------------------------------------------
304
305 .. testcode::
306
307 import numpy as np
308
309 class InfoArray(np.ndarray):
310
311 def __new__(subtype, shape, dtype=float, buffer=None, offset=0,
312 strides=None, order=None, info=None):
313 # Create the ndarray instance of our type, given the usual
314 # ndarray input arguments. This will call the standard
315 # ndarray constructor, but return an object of our type.
316 # It also triggers a call to InfoArray.__array_finalize__
317 obj = np.ndarray.__new__(subtype, shape, dtype, buffer, offset, strides,
318 order)
319 # set the new 'info' attribute to the value passed
320 obj.info = info
321 # Finally, we must return the newly created object:
322 return obj
323
324 def __array_finalize__(self, obj):
325 # ``self`` is a new object resulting from
326 # ndarray.__new__(InfoArray, ...), therefore it only has
327 # attributes that the ndarray.__new__ constructor gave it -
328 # i.e. those of a standard ndarray.
329 #
330 # We could have got to the ndarray.__new__ call in 3 ways:
331 # From an explicit constructor - e.g. InfoArray():
332 # obj is None
333 # (we're in the middle of the InfoArray.__new__
334 # constructor, and self.info will be set when we return to
335 # InfoArray.__new__)
336 if obj is None: return
337 # From view casting - e.g arr.view(InfoArray):
338 # obj is arr
339 # (type(obj) can be InfoArray)
340 # From new-from-template - e.g infoarr[:3]
341 # type(obj) is InfoArray
342 #
343 # Note that it is here, rather than in the __new__ method,
344 # that we set the default value for 'info', because this
345 # method sees all creation of default objects - with the
346 # InfoArray.__new__ constructor, but also with
347 # arr.view(InfoArray).
348 self.info = getattr(obj, 'info', None)
349 # We do not need to return anything
350
351
352 Using the object looks like this:
353
354 >>> obj = InfoArray(shape=(3,)) # explicit constructor
355 >>> type(obj)
356 <class 'InfoArray'>
357 >>> obj.info is None
358 True
359 >>> obj = InfoArray(shape=(3,), info='information')
360 >>> obj.info
361 'information'
362 >>> v = obj[1:] # new-from-template - here - slicing
363 >>> type(v)
364 <class 'InfoArray'>
365 >>> v.info
366 'information'
367 >>> arr = np.arange(10)
368 >>> cast_arr = arr.view(InfoArray) # view casting
369 >>> type(cast_arr)
370 <class 'InfoArray'>
371 >>> cast_arr.info is None
372 True
373
374 This class isn't very useful, because it has the same constructor as the
375 bare ndarray object, including passing in buffers and shapes and so on.
376 We would probably prefer the constructor to be able to take an already
377 formed ndarray from the usual numpy calls to ``np.array`` and return an
378 object.
379
380 Slightly more realistic example - attribute added to existing array
381 -------------------------------------------------------------------
382
383 Here is a class that takes a standard ndarray that already exists, casts
384 as our type, and adds an extra attribute.
385
386 .. testcode::
387
388 import numpy as np
389
390 class RealisticInfoArray(np.ndarray):
391
392 def __new__(cls, input_array, info=None):
393 # Input array is an already formed ndarray instance
394 # We first cast to be our class type
395 obj = np.asarray(input_array).view(cls)
396 # add the new attribute to the created instance
397 obj.info = info
398 # Finally, we must return the newly created object:
399 return obj
400
401 def __array_finalize__(self, obj):
402 # see InfoArray.__array_finalize__ for comments
403 if obj is None: return
404 self.info = getattr(obj, 'info', None)
405
406
407 So:
408
409 >>> arr = np.arange(5)
410 >>> obj = RealisticInfoArray(arr, info='information')
411 >>> type(obj)
412 <class 'RealisticInfoArray'>
413 >>> obj.info
414 'information'
415 >>> v = obj[1:]
416 >>> type(v)
417 <class 'RealisticInfoArray'>
418 >>> v.info
419 'information'
420
421 .. _array-wrap:
422
423 ``__array_wrap__`` for ufuncs
424 -------------------------------------------------------
425
426 ``__array_wrap__`` gets called at the end of numpy ufuncs and other numpy
427 functions, to allow a subclass to set the type of the return value
428 and update attributes and metadata. Let's show how this works with an example.
429 First we make the same subclass as above, but with a different name and
430 some print statements:
431
432 .. testcode::
433
434 import numpy as np
435
436 class MySubClass(np.ndarray):
437
438 def __new__(cls, input_array, info=None):
439 obj = np.asarray(input_array).view(cls)
440 obj.info = info
441 return obj
442
443 def __array_finalize__(self, obj):
444 print 'In __array_finalize__:'
445 print ' self is %s' % repr(self)
446 print ' obj is %s' % repr(obj)
447 if obj is None: return
448 self.info = getattr(obj, 'info', None)
449
450 def __array_wrap__(self, out_arr, context=None):
451 print 'In __array_wrap__:'
452 print ' self is %s' % repr(self)
453 print ' arr is %s' % repr(out_arr)
454 # then just call the parent
455 return np.ndarray.__array_wrap__(self, out_arr, context)
456
457 We run a ufunc on an instance of our new array:
458
459 >>> obj = MySubClass(np.arange(5), info='spam')
460 In __array_finalize__:
461 self is MySubClass([0, 1, 2, 3, 4])
462 obj is array([0, 1, 2, 3, 4])
463 >>> arr2 = np.arange(5)+1
464 >>> ret = np.add(arr2, obj)
465 In __array_wrap__:
466 self is MySubClass([0, 1, 2, 3, 4])
467 arr is array([1, 3, 5, 7, 9])
468 In __array_finalize__:
469 self is MySubClass([1, 3, 5, 7, 9])
470 obj is MySubClass([0, 1, 2, 3, 4])
471 >>> ret
472 MySubClass([1, 3, 5, 7, 9])
473 >>> ret.info
474 'spam'
475
476 Note that the ufunc (``np.add``) has called the ``__array_wrap__`` method of the
477 input with the highest ``__array_priority__`` value, in this case
478 ``MySubClass.__array_wrap__``, with arguments ``self`` as ``obj``, and
479 ``out_arr`` as the (ndarray) result of the addition. In turn, the
480 default ``__array_wrap__`` (``ndarray.__array_wrap__``) has cast the
481 result to class ``MySubClass``, and called ``__array_finalize__`` -
482 hence the copying of the ``info`` attribute. This has all happened at the C level.
483
484 But, we could do anything we wanted:
485
486 .. testcode::
487
488 class SillySubClass(np.ndarray):
489
490 def __array_wrap__(self, arr, context=None):
491 return 'I lost your data'
492
493 >>> arr1 = np.arange(5)
494 >>> obj = arr1.view(SillySubClass)
495 >>> arr2 = np.arange(5)
496 >>> ret = np.multiply(obj, arr2)
497 >>> ret
498 'I lost your data'
499
500 So, by defining a specific ``__array_wrap__`` method for our subclass,
501 we can tweak the output from ufuncs. The ``__array_wrap__`` method
502 requires ``self``, then an argument - which is the result of the ufunc -
503 and an optional parameter *context*. This parameter is returned by some
504 ufuncs as a 3-element tuple: (name of the ufunc, argument of the ufunc,
505 domain of the ufunc). ``__array_wrap__`` should return an instance of
506 its containing class. See the masked array subclass for an
507 implementation.
508
509 In addition to ``__array_wrap__``, which is called on the way out of the
510 ufunc, there is also an ``__array_prepare__`` method which is called on
511 the way into the ufunc, after the output arrays are created but before any
512 computation has been performed. The default implementation does nothing
513 but pass through the array. ``__array_prepare__`` should not attempt to
514 access the array data or resize the array, it is intended for setting the
515 output array type, updating attributes and metadata, and performing any
516 checks based on the input that may be desired before computation begins.
517 Like ``__array_wrap__``, ``__array_prepare__`` must return an ndarray or
518 subclass thereof or raise an error.
519
520 Extra gotchas - custom ``__del__`` methods and ndarray.base
521 -----------------------------------------------------------
522
523 One of the problems that ndarray solves is keeping track of memory
524 ownership of ndarrays and their views. Consider the case where we have
525 created an ndarray, ``arr`` and have taken a slice with ``v = arr[1:]``.
526 The two objects are looking at the same memory. Numpy keeps track of
527 where the data came from for a particular array or view, with the
528 ``base`` attribute:
529
530 >>> # A normal ndarray, that owns its own data
531 >>> arr = np.zeros((4,))
532 >>> # In this case, base is None
533 >>> arr.base is None
534 True
535 >>> # We take a view
536 >>> v1 = arr[1:]
537 >>> # base now points to the array that it derived from
538 >>> v1.base is arr
539 True
540 >>> # Take a view of a view
541 >>> v2 = v1[1:]
542 >>> # base points to the view it derived from
543 >>> v2.base is v1
544 True
545
546 In general, if the array owns its own memory, as for ``arr`` in this
547 case, then ``arr.base`` will be None - there are some exceptions to this
548 - see the numpy book for more details.
549
550 The ``base`` attribute is useful in being able to tell whether we have
551 a view or the original array. This in turn can be useful if we need
552 to know whether or not to do some specific cleanup when the subclassed
553 array is deleted. For example, we may only want to do the cleanup if
554 the original array is deleted, but not the views. For an example of
555 how this can work, have a look at the ``memmap`` class in
556 ``numpy.core``.
557
558
559 """
560 from __future__ import division, absolute_import, print_function