Chris@87: """
Chris@87: =====================================
Chris@87: Structured Arrays (and Record Arrays)
Chris@87: =====================================
Chris@87: 
Chris@87: Introduction
Chris@87: ============
Chris@87: 
Chris@87: Numpy provides powerful capabilities to create arrays of structs or records.
Chris@87: These arrays permit one to manipulate the data by the structs or by fields of
Chris@87: the struct. A simple example will show what is meant.: ::
Chris@87: 
Chris@87:  >>> x = np.zeros((2,),dtype=('i4,f4,a10'))
Chris@87:  >>> x[:] = [(1,2.,'Hello'),(2,3.,"World")]
Chris@87:  >>> x
Chris@87:  array([(1, 2.0, 'Hello'), (2, 3.0, 'World')],
Chris@87:       dtype=[('f0', '>i4'), ('f1', '>f4'), ('f2', '|S10')])
Chris@87: 
Chris@87: Here we have created a one-dimensional array of length 2. Each element of
Chris@87: this array is a record that contains three items, a 32-bit integer, a 32-bit
Chris@87: float, and a string of length 10 or less. If we index this array at the second
Chris@87: position we get the second record: ::
Chris@87: 
Chris@87:  >>> x[1]
Chris@87:  (2,3.,"World")
Chris@87: 
Chris@87: Conveniently, one can access any field of the array by indexing using the
Chris@87: string that names that field. In this case the fields have received the
Chris@87: default names 'f0', 'f1' and 'f2'. ::
Chris@87: 
Chris@87:  >>> y = x['f1']
Chris@87:  >>> y
Chris@87:  array([ 2.,  3.], dtype=float32)
Chris@87:  >>> y[:] = 2*y
Chris@87:  >>> y
Chris@87:  array([ 4.,  6.], dtype=float32)
Chris@87:  >>> x
Chris@87:  array([(1, 4.0, 'Hello'), (2, 6.0, 'World')],
Chris@87:        dtype=[('f0', '>i4'), ('f1', '>f4'), ('f2', '|S10')])
Chris@87: 
Chris@87: In these examples, y is a simple float array consisting of the 2nd field
Chris@87: in the record. But, rather than being a copy of the data in the structured
Chris@87: array, it is a view, i.e., it shares exactly the same memory locations.
Chris@87: Thus, when we updated this array by doubling its values, the structured
Chris@87: array shows the corresponding values as doubled as well. Likewise, if one
Chris@87: changes the record, the field view also changes: ::
Chris@87: 
Chris@87:  >>> x[1] = (-1,-1.,"Master")
Chris@87:  >>> x
Chris@87:  array([(1, 4.0, 'Hello'), (-1, -1.0, 'Master')],
Chris@87:        dtype=[('f0', '>i4'), ('f1', '>f4'), ('f2', '|S10')])
Chris@87:  >>> y
Chris@87:  array([ 4., -1.], dtype=float32)
Chris@87: 
Chris@87: Defining Structured Arrays
Chris@87: ==========================
Chris@87: 
Chris@87: One defines a structured array through the dtype object.  There are
Chris@87: **several** alternative ways to define the fields of a record.  Some of
Chris@87: these variants provide backward compatibility with Numeric, numarray, or
Chris@87: another module, and should not be used except for such purposes. These
Chris@87: will be so noted. One specifies record structure in
Chris@87: one of four alternative ways, using an argument (as supplied to a dtype
Chris@87: function keyword or a dtype object constructor itself).  This
Chris@87: argument must be one of the following: 1) string, 2) tuple, 3) list, or
Chris@87: 4) dictionary.  Each of these is briefly described below.
Chris@87: 
Chris@87: 1) String argument (as used in the above examples).
Chris@87: In this case, the constructor expects a comma-separated list of type
Chris@87: specifiers, optionally with extra shape information.
Chris@87: The type specifiers can take 4 different forms: ::
Chris@87: 
Chris@87:   a) b1, i1, i2, i4, i8, u1, u2, u4, u8, f2, f4, f8, c8, c16, a<n>
Chris@87:      (representing bytes, ints, unsigned ints, floats, complex and
Chris@87:       fixed length strings of specified byte lengths)
Chris@87:   b) int8,...,uint8,...,float16, float32, float64, complex64, complex128
Chris@87:      (this time with bit sizes)
Chris@87:   c) older Numeric/numarray type specifications (e.g. Float32).
Chris@87:      Don't use these in new code!
Chris@87:   d) Single character type specifiers (e.g H for unsigned short ints).
Chris@87:      Avoid using these unless you must. Details can be found in the
Chris@87:      Numpy book
Chris@87: 
Chris@87: These different styles can be mixed within the same string (but why would you
Chris@87: want to do that?). Furthermore, each type specifier can be prefixed
Chris@87: with a repetition number, or a shape. In these cases an array
Chris@87: element is created, i.e., an array within a record. That array
Chris@87: is still referred to as a single field. An example: ::
Chris@87: 
Chris@87:  >>> x = np.zeros(3, dtype='3int8, float32, (2,3)float64')
Chris@87:  >>> x
Chris@87:  array([([0, 0, 0], 0.0, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]),
Chris@87:         ([0, 0, 0], 0.0, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]),
Chris@87:         ([0, 0, 0], 0.0, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]])],
Chris@87:        dtype=[('f0', '|i1', 3), ('f1', '>f4'), ('f2', '>f8', (2, 3))])
Chris@87: 
Chris@87: By using strings to define the record structure, it precludes being
Chris@87: able to name the fields in the original definition. The names can
Chris@87: be changed as shown later, however.
Chris@87: 
Chris@87: 2) Tuple argument: The only relevant tuple case that applies to record
Chris@87: structures is when a structure is mapped to an existing data type. This
Chris@87: is done by pairing in a tuple, the existing data type with a matching
Chris@87: dtype definition (using any of the variants being described here). As
Chris@87: an example (using a definition using a list, so see 3) for further
Chris@87: details): ::
Chris@87: 
Chris@87:  >>> x = np.zeros(3, dtype=('i4',[('r','u1'), ('g','u1'), ('b','u1'), ('a','u1')]))
Chris@87:  >>> x
Chris@87:  array([0, 0, 0])
Chris@87:  >>> x['r']
Chris@87:  array([0, 0, 0], dtype=uint8)
Chris@87: 
Chris@87: In this case, an array is produced that looks and acts like a simple int32 array,
Chris@87: but also has definitions for fields that use only one byte of the int32 (a bit
Chris@87: like Fortran equivalencing).
Chris@87: 
Chris@87: 3) List argument: In this case the record structure is defined with a list of
Chris@87: tuples. Each tuple has 2 or 3 elements specifying: 1) The name of the field
Chris@87: ('' is permitted), 2) the type of the field, and 3) the shape (optional).
Chris@87: For example::
Chris@87: 
Chris@87:  >>> x = np.zeros(3, dtype=[('x','f4'),('y',np.float32),('value','f4',(2,2))])
Chris@87:  >>> x
Chris@87:  array([(0.0, 0.0, [[0.0, 0.0], [0.0, 0.0]]),
Chris@87:         (0.0, 0.0, [[0.0, 0.0], [0.0, 0.0]]),
Chris@87:         (0.0, 0.0, [[0.0, 0.0], [0.0, 0.0]])],
Chris@87:        dtype=[('x', '>f4'), ('y', '>f4'), ('value', '>f4', (2, 2))])
Chris@87: 
Chris@87: 4) Dictionary argument: two different forms are permitted. The first consists
Chris@87: of a dictionary with two required keys ('names' and 'formats'), each having an
Chris@87: equal sized list of values. The format list contains any type/shape specifier
Chris@87: allowed in other contexts. The names must be strings. There are two optional
Chris@87: keys: 'offsets' and 'titles'. Each must be a correspondingly matching list to
Chris@87: the required two where offsets contain integer offsets for each field, and
Chris@87: titles are objects containing metadata for each field (these do not have
Chris@87: to be strings), where the value of None is permitted. As an example: ::
Chris@87: 
Chris@87:  >>> x = np.zeros(3, dtype={'names':['col1', 'col2'], 'formats':['i4','f4']})
Chris@87:  >>> x
Chris@87:  array([(0, 0.0), (0, 0.0), (0, 0.0)],
Chris@87:        dtype=[('col1', '>i4'), ('col2', '>f4')])
Chris@87: 
Chris@87: The other dictionary form permitted is a dictionary of name keys with tuple
Chris@87: values specifying type, offset, and an optional title. ::
Chris@87: 
Chris@87:  >>> x = np.zeros(3, dtype={'col1':('i1',0,'title 1'), 'col2':('f4',1,'title 2')})
Chris@87:  >>> x
Chris@87:  array([(0, 0.0), (0, 0.0), (0, 0.0)],
Chris@87:        dtype=[(('title 1', 'col1'), '|i1'), (('title 2', 'col2'), '>f4')])
Chris@87: 
Chris@87: Accessing and modifying field names
Chris@87: ===================================
Chris@87: 
Chris@87: The field names are an attribute of the dtype object defining the record structure.
Chris@87: For the last example: ::
Chris@87: 
Chris@87:  >>> x.dtype.names
Chris@87:  ('col1', 'col2')
Chris@87:  >>> x.dtype.names = ('x', 'y')
Chris@87:  >>> x
Chris@87:  array([(0, 0.0), (0, 0.0), (0, 0.0)],
Chris@87:       dtype=[(('title 1', 'x'), '|i1'), (('title 2', 'y'), '>f4')])
Chris@87:  >>> x.dtype.names = ('x', 'y', 'z') # wrong number of names
Chris@87:  <type 'exceptions.ValueError'>: must replace all names at once with a sequence of length 2
Chris@87: 
Chris@87: Accessing field titles
Chris@87: ====================================
Chris@87: 
Chris@87: The field titles provide a standard place to put associated info for fields.
Chris@87: They do not have to be strings. ::
Chris@87: 
Chris@87:  >>> x.dtype.fields['x'][2]
Chris@87:  'title 1'
Chris@87: 
Chris@87: Accessing multiple fields at once
Chris@87: ====================================
Chris@87: 
Chris@87: You can access multiple fields at once using a list of field names: ::
Chris@87: 
Chris@87:  >>> x = np.array([(1.5,2.5,(1.0,2.0)),(3.,4.,(4.,5.)),(1.,3.,(2.,6.))],
Chris@87:          dtype=[('x','f4'),('y',np.float32),('value','f4',(2,2))])
Chris@87: 
Chris@87: Notice that `x` is created with a list of tuples. ::
Chris@87: 
Chris@87:  >>> x[['x','y']]
Chris@87:  array([(1.5, 2.5), (3.0, 4.0), (1.0, 3.0)],
Chris@87:       dtype=[('x', '<f4'), ('y', '<f4')])
Chris@87:  >>> x[['x','value']]
Chris@87:  array([(1.5, [[1.0, 2.0], [1.0, 2.0]]), (3.0, [[4.0, 5.0], [4.0, 5.0]]),
Chris@87:        (1.0, [[2.0, 6.0], [2.0, 6.0]])],
Chris@87:       dtype=[('x', '<f4'), ('value', '<f4', (2, 2))])
Chris@87: 
Chris@87: The fields are returned in the order they are asked for.::
Chris@87: 
Chris@87:  >>> x[['y','x']]
Chris@87:  array([(2.5, 1.5), (4.0, 3.0), (3.0, 1.0)],
Chris@87:       dtype=[('y', '<f4'), ('x', '<f4')])
Chris@87: 
Chris@87: Filling structured arrays
Chris@87: =========================
Chris@87: 
Chris@87: Structured arrays can be filled by field or row by row. ::
Chris@87: 
Chris@87:  >>> arr = np.zeros((5,), dtype=[('var1','f8'),('var2','f8')])
Chris@87:  >>> arr['var1'] = np.arange(5)
Chris@87: 
Chris@87: If you fill it in row by row, it takes a take a tuple
Chris@87: (but not a list or array!)::
Chris@87: 
Chris@87:  >>> arr[0] = (10,20)
Chris@87:  >>> arr
Chris@87:  array([(10.0, 20.0), (1.0, 0.0), (2.0, 0.0), (3.0, 0.0), (4.0, 0.0)],
Chris@87:       dtype=[('var1', '<f8'), ('var2', '<f8')])
Chris@87: 
Chris@87: More information
Chris@87: ====================================
Chris@87: You can find some more information on recarrays and structured  arrays
Chris@87: (including the difference between the two) `here
Chris@87: <http://www.scipy.org/Cookbook/Recarray>`_.
Chris@87: 
Chris@87: """
Chris@87: from __future__ import division, absolute_import, print_function