Chris@87: """
Chris@87: ==============
Chris@87: Array indexing
Chris@87: ==============
Chris@87: 
Chris@87: Array indexing refers to any use of the square brackets ([]) to index
Chris@87: array values. There are many options to indexing, which give numpy
Chris@87: indexing great power, but with power comes some complexity and the
Chris@87: potential for confusion. This section is just an overview of the
Chris@87: various options and issues related to indexing. Aside from single
Chris@87: element indexing, the details on most of these options are to be
Chris@87: found in related sections.
Chris@87: 
Chris@87: Assignment vs referencing
Chris@87: =========================
Chris@87: 
Chris@87: Most of the following examples show the use of indexing when
Chris@87: referencing data in an array. The examples work just as well
Chris@87: when assigning to an array. See the section at the end for
Chris@87: specific examples and explanations on how assignments work.
Chris@87: 
Chris@87: Single element indexing
Chris@87: =======================
Chris@87: 
Chris@87: Single element indexing for a 1-D array is what one expects. It work
Chris@87: exactly like that for other standard Python sequences. It is 0-based,
Chris@87: and accepts negative indices for indexing from the end of the array. ::
Chris@87: 
Chris@87:     >>> x = np.arange(10)
Chris@87:     >>> x[2]
Chris@87:     2
Chris@87:     >>> x[-2]
Chris@87:     8
Chris@87: 
Chris@87: Unlike lists and tuples, numpy arrays support multidimensional indexing
Chris@87: for multidimensional arrays. That means that it is not necessary to
Chris@87: separate each dimension's index into its own set of square brackets. ::
Chris@87: 
Chris@87:     >>> x.shape = (2,5) # now x is 2-dimensional
Chris@87:     >>> x[1,3]
Chris@87:     8
Chris@87:     >>> x[1,-1]
Chris@87:     9
Chris@87: 
Chris@87: Note that if one indexes a multidimensional array with fewer indices
Chris@87: than dimensions, one gets a subdimensional array. For example: ::
Chris@87: 
Chris@87:     >>> x[0]
Chris@87:     array([0, 1, 2, 3, 4])
Chris@87: 
Chris@87: That is, each index specified selects the array corresponding to the
Chris@87: rest of the dimensions selected. In the above example, choosing 0
Chris@87: means that remaining dimension of lenth 5 is being left unspecified,
Chris@87: and that what is returned is an array of that dimensionality and size.
Chris@87: It must be noted that the returned array is not a copy of the original,
Chris@87: but points to the same values in memory as does the original array.
Chris@87: In  this case, the 1-D array at the first position (0) is returned.
Chris@87: So using a single index on the returned array, results in a single
Chris@87: element being returned. That is: ::
Chris@87: 
Chris@87:     >>> x[0][2]
Chris@87:     2
Chris@87: 
Chris@87: So note that ``x[0,2] = x[0][2]`` though the second case is more
Chris@87: inefficient a new temporary array is created after the first index
Chris@87: that is subsequently indexed by 2.
Chris@87: 
Chris@87: Note to those used to IDL or Fortran memory order as it relates to
Chris@87: indexing.  Numpy uses C-order indexing. That means that the last
Chris@87: index usually represents the most rapidly changing memory location,
Chris@87: unlike Fortran or IDL, where the first index represents the most
Chris@87: rapidly changing location in memory. This difference represents a
Chris@87: great potential for confusion.
Chris@87: 
Chris@87: Other indexing options
Chris@87: ======================
Chris@87: 
Chris@87: It is possible to slice and stride arrays to extract arrays of the
Chris@87: same number of dimensions, but of different sizes than the original.
Chris@87: The slicing and striding works exactly the same way it does for lists
Chris@87: and tuples except that they can be applied to multiple dimensions as
Chris@87: well. A few examples illustrates best: ::
Chris@87: 
Chris@87:  >>> x = np.arange(10)
Chris@87:  >>> x[2:5]
Chris@87:  array([2, 3, 4])
Chris@87:  >>> x[:-7]
Chris@87:  array([0, 1, 2])
Chris@87:  >>> x[1:7:2]
Chris@87:  array([1, 3, 5])
Chris@87:  >>> y = np.arange(35).reshape(5,7)
Chris@87:  >>> y[1:5:2,::3]
Chris@87:  array([[ 7, 10, 13],
Chris@87:         [21, 24, 27]])
Chris@87: 
Chris@87: Note that slices of arrays do not copy the internal array data but
Chris@87: also produce new views of the original data.
Chris@87: 
Chris@87: It is possible to index arrays with other arrays for the purposes of
Chris@87: selecting lists of values out of arrays into new arrays. There are
Chris@87: two different ways of accomplishing this. One uses one or more arrays
Chris@87: of index values. The other involves giving a boolean array of the proper
Chris@87: shape to indicate the values to be selected. Index arrays are a very
Chris@87: powerful tool that allow one to avoid looping over individual elements in
Chris@87: arrays and thus greatly improve performance.
Chris@87: 
Chris@87: It is possible to use special features to effectively increase the
Chris@87: number of dimensions in an array through indexing so the resulting
Chris@87: array aquires the shape needed for use in an expression or with a
Chris@87: specific function.
Chris@87: 
Chris@87: Index arrays
Chris@87: ============
Chris@87: 
Chris@87: Numpy arrays may be indexed with other arrays (or any other sequence-
Chris@87: like object that can be converted to an array, such as lists, with the
Chris@87: exception of tuples; see the end of this document for why this is). The
Chris@87: use of index arrays ranges from simple, straightforward cases to
Chris@87: complex, hard-to-understand cases. For all cases of index arrays, what
Chris@87: is returned is a copy of the original data, not a view as one gets for
Chris@87: slices.
Chris@87: 
Chris@87: Index arrays must be of integer type. Each value in the array indicates
Chris@87: which value in the array to use in place of the index. To illustrate: ::
Chris@87: 
Chris@87:  >>> x = np.arange(10,1,-1)
Chris@87:  >>> x
Chris@87:  array([10,  9,  8,  7,  6,  5,  4,  3,  2])
Chris@87:  >>> x[np.array([3, 3, 1, 8])]
Chris@87:  array([7, 7, 9, 2])
Chris@87: 
Chris@87: 
Chris@87: The index array consisting of the values 3, 3, 1 and 8 correspondingly
Chris@87: create an array of length 4 (same as the index array) where each index
Chris@87: is replaced by the value the index array has in the array being indexed.
Chris@87: 
Chris@87: Negative values are permitted and work as they do with single indices
Chris@87: or slices: ::
Chris@87: 
Chris@87:  >>> x[np.array([3,3,-3,8])]
Chris@87:  array([7, 7, 4, 2])
Chris@87: 
Chris@87: It is an error to have index values out of bounds: ::
Chris@87: 
Chris@87:  >>> x[np.array([3, 3, 20, 8])]
Chris@87:  <type 'exceptions.IndexError'>: index 20 out of bounds 0<=index<9
Chris@87: 
Chris@87: Generally speaking, what is returned when index arrays are used is
Chris@87: an array with the same shape as the index array, but with the type
Chris@87: and values of the array being indexed. As an example, we can use a
Chris@87: multidimensional index array instead: ::
Chris@87: 
Chris@87:  >>> x[np.array([[1,1],[2,3]])]
Chris@87:  array([[9, 9],
Chris@87:         [8, 7]])
Chris@87: 
Chris@87: Indexing Multi-dimensional arrays
Chris@87: =================================
Chris@87: 
Chris@87: Things become more complex when multidimensional arrays are indexed,
Chris@87: particularly with multidimensional index arrays. These tend to be
Chris@87: more unusal uses, but theyare permitted, and they are useful for some
Chris@87: problems. We'll  start with thesimplest multidimensional case (using
Chris@87: the array y from the previous examples): ::
Chris@87: 
Chris@87:  >>> y[np.array([0,2,4]), np.array([0,1,2])]
Chris@87:  array([ 0, 15, 30])
Chris@87: 
Chris@87: In this case, if the index arrays have a matching shape, and there is
Chris@87: an index array for each dimension of the array being indexed, the
Chris@87: resultant array has the same shape as the index arrays, and the values
Chris@87: correspond to the index set for each position in the index arrays. In
Chris@87: this example, the first index value is 0 for both index arrays, and
Chris@87: thus the first value of the resultant array is y[0,0]. The next value
Chris@87: is y[2,1], and the last is y[4,2].
Chris@87: 
Chris@87: If the index arrays do not have the same shape, there is an attempt to
Chris@87: broadcast them to the same shape.  If they cannot be broadcast to the
Chris@87: same shape, an exception is raised: ::
Chris@87: 
Chris@87:  >>> y[np.array([0,2,4]), np.array([0,1])]
Chris@87:  <type 'exceptions.ValueError'>: shape mismatch: objects cannot be
Chris@87:  broadcast to a single shape
Chris@87: 
Chris@87: The broadcasting mechanism permits index arrays to be combined with
Chris@87: scalars for other indices. The effect is that the scalar value is used
Chris@87: for all the corresponding values of the index arrays: ::
Chris@87: 
Chris@87:  >>> y[np.array([0,2,4]), 1]
Chris@87:  array([ 1, 15, 29])
Chris@87: 
Chris@87: Jumping to the next level of complexity, it is possible to only
Chris@87: partially index an array with index arrays. It takes a bit of thought
Chris@87: to understand what happens in such cases. For example if we just use
Chris@87: one index array with y: ::
Chris@87: 
Chris@87:  >>> y[np.array([0,2,4])]
Chris@87:  array([[ 0,  1,  2,  3,  4,  5,  6],
Chris@87:         [14, 15, 16, 17, 18, 19, 20],
Chris@87:         [28, 29, 30, 31, 32, 33, 34]])
Chris@87: 
Chris@87: What results is the construction of a new array where each value of
Chris@87: the index array selects one row from the array being indexed and the
Chris@87: resultant array has the resulting shape (size of row, number index
Chris@87: elements).
Chris@87: 
Chris@87: An example of where this may be useful is for a color lookup table
Chris@87: where we want to map the values of an image into RGB triples for
Chris@87: display. The lookup table could have a shape (nlookup, 3). Indexing
Chris@87: such an array with an image with shape (ny, nx) with dtype=np.uint8
Chris@87: (or any integer type so long as values are with the bounds of the
Chris@87: lookup table) will result in an array of shape (ny, nx, 3) where a
Chris@87: triple of RGB values is associated with each pixel location.
Chris@87: 
Chris@87: In general, the shape of the resulant array will be the concatenation
Chris@87: of the shape of the index array (or the shape that all the index arrays
Chris@87: were broadcast to) with the shape of any unused dimensions (those not
Chris@87: indexed) in the array being indexed.
Chris@87: 
Chris@87: Boolean or "mask" index arrays
Chris@87: ==============================
Chris@87: 
Chris@87: Boolean arrays used as indices are treated in a different manner
Chris@87: entirely than index arrays. Boolean arrays must be of the same shape
Chris@87: as the initial dimensions of the array being indexed. In the
Chris@87: most straightforward case, the boolean array has the same shape: ::
Chris@87: 
Chris@87:  >>> b = y>20
Chris@87:  >>> y[b]
Chris@87:  array([21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34])
Chris@87: 
Chris@87: The result is a 1-D array containing all the elements in the indexed
Chris@87: array corresponding to all the true elements in the boolean array. As
Chris@87: with index arrays, what is returned is a copy of the data, not a view
Chris@87: as one gets with slices.
Chris@87: 
Chris@87: The result will be multidimensional if y has more dimensions than b.
Chris@87: For example: ::
Chris@87: 
Chris@87:  >>> b[:,5] # use a 1-D boolean whose first dim agrees with the first dim of y
Chris@87:  array([False, False, False,  True,  True], dtype=bool)
Chris@87:  >>> y[b[:,5]]
Chris@87:  array([[21, 22, 23, 24, 25, 26, 27],
Chris@87:         [28, 29, 30, 31, 32, 33, 34]])
Chris@87: 
Chris@87: Here the 4th and 5th rows are selected from the indexed array and
Chris@87: combined to make a 2-D array.
Chris@87: 
Chris@87: In general, when the boolean array has fewer dimensions than the array
Chris@87: being indexed, this is equivalent to y[b, ...], which means
Chris@87: y is indexed by b followed by as many : as are needed to fill
Chris@87: out the rank of y.
Chris@87: Thus the shape of the result is one dimension containing the number
Chris@87: of True elements of the boolean array, followed by the remaining
Chris@87: dimensions of the array being indexed.
Chris@87: 
Chris@87: For example, using a 2-D boolean array of shape (2,3)
Chris@87: with four True elements to select rows from a 3-D array of shape
Chris@87: (2,3,5) results in a 2-D result of shape (4,5): ::
Chris@87: 
Chris@87:  >>> x = np.arange(30).reshape(2,3,5)
Chris@87:  >>> x
Chris@87:  array([[[ 0,  1,  2,  3,  4],
Chris@87:          [ 5,  6,  7,  8,  9],
Chris@87:          [10, 11, 12, 13, 14]],
Chris@87:         [[15, 16, 17, 18, 19],
Chris@87:          [20, 21, 22, 23, 24],
Chris@87:          [25, 26, 27, 28, 29]]])
Chris@87:  >>> b = np.array([[True, True, False], [False, True, True]])
Chris@87:  >>> x[b]
Chris@87:  array([[ 0,  1,  2,  3,  4],
Chris@87:         [ 5,  6,  7,  8,  9],
Chris@87:         [20, 21, 22, 23, 24],
Chris@87:         [25, 26, 27, 28, 29]])
Chris@87: 
Chris@87: For further details, consult the numpy reference documentation on array indexing.
Chris@87: 
Chris@87: Combining index arrays with slices
Chris@87: ==================================
Chris@87: 
Chris@87: Index arrays may be combined with slices. For example: ::
Chris@87: 
Chris@87:  >>> y[np.array([0,2,4]),1:3]
Chris@87:  array([[ 1,  2],
Chris@87:         [15, 16],
Chris@87:         [29, 30]])
Chris@87: 
Chris@87: In effect, the slice is converted to an index array
Chris@87: np.array([[1,2]]) (shape (1,2)) that is broadcast with the index array
Chris@87: to produce a resultant array of shape (3,2).
Chris@87: 
Chris@87: Likewise, slicing can be combined with broadcasted boolean indices: ::
Chris@87: 
Chris@87:  >>> y[b[:,5],1:3]
Chris@87:  array([[22, 23],
Chris@87:         [29, 30]])
Chris@87: 
Chris@87: Structural indexing tools
Chris@87: =========================
Chris@87: 
Chris@87: To facilitate easy matching of array shapes with expressions and in
Chris@87: assignments, the np.newaxis object can be used within array indices
Chris@87: to add new dimensions with a size of 1. For example: ::
Chris@87: 
Chris@87:  >>> y.shape
Chris@87:  (5, 7)
Chris@87:  >>> y[:,np.newaxis,:].shape
Chris@87:  (5, 1, 7)
Chris@87: 
Chris@87: Note that there are no new elements in the array, just that the
Chris@87: dimensionality is increased. This can be handy to combine two
Chris@87: arrays in a way that otherwise would require explicitly reshaping
Chris@87: operations. For example: ::
Chris@87: 
Chris@87:  >>> x = np.arange(5)
Chris@87:  >>> x[:,np.newaxis] + x[np.newaxis,:]
Chris@87:  array([[0, 1, 2, 3, 4],
Chris@87:         [1, 2, 3, 4, 5],
Chris@87:         [2, 3, 4, 5, 6],
Chris@87:         [3, 4, 5, 6, 7],
Chris@87:         [4, 5, 6, 7, 8]])
Chris@87: 
Chris@87: The ellipsis syntax maybe used to indicate selecting in full any
Chris@87: remaining unspecified dimensions. For example: ::
Chris@87: 
Chris@87:  >>> z = np.arange(81).reshape(3,3,3,3)
Chris@87:  >>> z[1,...,2]
Chris@87:  array([[29, 32, 35],
Chris@87:         [38, 41, 44],
Chris@87:         [47, 50, 53]])
Chris@87: 
Chris@87: This is equivalent to: ::
Chris@87: 
Chris@87:  >>> z[1,:,:,2]
Chris@87:  array([[29, 32, 35],
Chris@87:         [38, 41, 44],
Chris@87:         [47, 50, 53]])
Chris@87: 
Chris@87: Assigning values to indexed arrays
Chris@87: ==================================
Chris@87: 
Chris@87: As mentioned, one can select a subset of an array to assign to using
Chris@87: a single index, slices, and index and mask arrays. The value being
Chris@87: assigned to the indexed array must be shape consistent (the same shape
Chris@87: or broadcastable to the shape the index produces). For example, it is
Chris@87: permitted to assign a constant to a slice: ::
Chris@87: 
Chris@87:  >>> x = np.arange(10)
Chris@87:  >>> x[2:7] = 1
Chris@87: 
Chris@87: or an array of the right size: ::
Chris@87: 
Chris@87:  >>> x[2:7] = np.arange(5)
Chris@87: 
Chris@87: Note that assignments may result in changes if assigning
Chris@87: higher types to lower types (like floats to ints) or even
Chris@87: exceptions (assigning complex to floats or ints): ::
Chris@87: 
Chris@87:  >>> x[1] = 1.2
Chris@87:  >>> x[1]
Chris@87:  1
Chris@87:  >>> x[1] = 1.2j
Chris@87:  <type 'exceptions.TypeError'>: can't convert complex to long; use
Chris@87:  long(abs(z))
Chris@87: 
Chris@87: 
Chris@87: Unlike some of the references (such as array and mask indices)
Chris@87: assignments are always made to the original data in the array
Chris@87: (indeed, nothing else would make sense!). Note though, that some
Chris@87: actions may not work as one may naively expect. This particular
Chris@87: example is often surprising to people: ::
Chris@87: 
Chris@87:  >>> x = np.arange(0, 50, 10)
Chris@87:  >>> x
Chris@87:  array([ 0, 10, 20, 30, 40])
Chris@87:  >>> x[np.array([1, 1, 3, 1])] += 1
Chris@87:  >>> x
Chris@87:  array([ 0, 11, 20, 31, 40])
Chris@87: 
Chris@87: Where people expect that the 1st location will be incremented by 3.
Chris@87: In fact, it will only be incremented by 1. The reason is because
Chris@87: a new array is extracted from the original (as a temporary) containing
Chris@87: the values at 1, 1, 3, 1, then the value 1 is added to the temporary,
Chris@87: and then the temporary is assigned back to the original array. Thus
Chris@87: the value of the array at x[1]+1 is assigned to x[1] three times,
Chris@87: rather than being incremented 3 times.
Chris@87: 
Chris@87: Dealing with variable numbers of indices within programs
Chris@87: ========================================================
Chris@87: 
Chris@87: The index syntax is very powerful but limiting when dealing with
Chris@87: a variable number of indices. For example, if you want to write
Chris@87: a function that can handle arguments with various numbers of
Chris@87: dimensions without having to write special case code for each
Chris@87: number of possible dimensions, how can that be done? If one
Chris@87: supplies to the index a tuple, the tuple will be interpreted
Chris@87: as a list of indices. For example (using the previous definition
Chris@87: for the array z): ::
Chris@87: 
Chris@87:  >>> indices = (1,1,1,1)
Chris@87:  >>> z[indices]
Chris@87:  40
Chris@87: 
Chris@87: So one can use code to construct tuples of any number of indices
Chris@87: and then use these within an index.
Chris@87: 
Chris@87: Slices can be specified within programs by using the slice() function
Chris@87: in Python. For example: ::
Chris@87: 
Chris@87:  >>> indices = (1,1,1,slice(0,2)) # same as [1,1,1,0:2]
Chris@87:  >>> z[indices]
Chris@87:  array([39, 40])
Chris@87: 
Chris@87: Likewise, ellipsis can be specified by code by using the Ellipsis
Chris@87: object: ::
Chris@87: 
Chris@87:  >>> indices = (1, Ellipsis, 1) # same as [1,...,1]
Chris@87:  >>> z[indices]
Chris@87:  array([[28, 31, 34],
Chris@87:         [37, 40, 43],
Chris@87:         [46, 49, 52]])
Chris@87: 
Chris@87: For this reason it is possible to use the output from the np.where()
Chris@87: function directly as an index since it always returns a tuple of index
Chris@87: arrays.
Chris@87: 
Chris@87: Because the special treatment of tuples, they are not automatically
Chris@87: converted to an array as a list would be. As an example: ::
Chris@87: 
Chris@87:  >>> z[[1,1,1,1]] # produces a large array
Chris@87:  array([[[[27, 28, 29],
Chris@87:           [30, 31, 32], ...
Chris@87:  >>> z[(1,1,1,1)] # returns a single value
Chris@87:  40
Chris@87: 
Chris@87: """
Chris@87: from __future__ import division, absolute_import, print_function