Chris@87: """ Chris@87: ===================================== Chris@87: Structured Arrays (and Record Arrays) Chris@87: ===================================== Chris@87: Chris@87: Introduction Chris@87: ============ Chris@87: Chris@87: Numpy provides powerful capabilities to create arrays of structs or records. Chris@87: These arrays permit one to manipulate the data by the structs or by fields of Chris@87: the struct. A simple example will show what is meant.: :: Chris@87: Chris@87: >>> x = np.zeros((2,),dtype=('i4,f4,a10')) Chris@87: >>> x[:] = [(1,2.,'Hello'),(2,3.,"World")] Chris@87: >>> x Chris@87: array([(1, 2.0, 'Hello'), (2, 3.0, 'World')], Chris@87: dtype=[('f0', '>i4'), ('f1', '>f4'), ('f2', '|S10')]) Chris@87: Chris@87: Here we have created a one-dimensional array of length 2. Each element of Chris@87: this array is a record that contains three items, a 32-bit integer, a 32-bit Chris@87: float, and a string of length 10 or less. If we index this array at the second Chris@87: position we get the second record: :: Chris@87: Chris@87: >>> x[1] Chris@87: (2,3.,"World") Chris@87: Chris@87: Conveniently, one can access any field of the array by indexing using the Chris@87: string that names that field. In this case the fields have received the Chris@87: default names 'f0', 'f1' and 'f2'. :: Chris@87: Chris@87: >>> y = x['f1'] Chris@87: >>> y Chris@87: array([ 2., 3.], dtype=float32) Chris@87: >>> y[:] = 2*y Chris@87: >>> y Chris@87: array([ 4., 6.], dtype=float32) Chris@87: >>> x Chris@87: array([(1, 4.0, 'Hello'), (2, 6.0, 'World')], Chris@87: dtype=[('f0', '>i4'), ('f1', '>f4'), ('f2', '|S10')]) Chris@87: Chris@87: In these examples, y is a simple float array consisting of the 2nd field Chris@87: in the record. But, rather than being a copy of the data in the structured Chris@87: array, it is a view, i.e., it shares exactly the same memory locations. Chris@87: Thus, when we updated this array by doubling its values, the structured Chris@87: array shows the corresponding values as doubled as well. Likewise, if one Chris@87: changes the record, the field view also changes: :: Chris@87: Chris@87: >>> x[1] = (-1,-1.,"Master") Chris@87: >>> x Chris@87: array([(1, 4.0, 'Hello'), (-1, -1.0, 'Master')], Chris@87: dtype=[('f0', '>i4'), ('f1', '>f4'), ('f2', '|S10')]) Chris@87: >>> y Chris@87: array([ 4., -1.], dtype=float32) Chris@87: Chris@87: Defining Structured Arrays Chris@87: ========================== Chris@87: Chris@87: One defines a structured array through the dtype object. There are Chris@87: **several** alternative ways to define the fields of a record. Some of Chris@87: these variants provide backward compatibility with Numeric, numarray, or Chris@87: another module, and should not be used except for such purposes. These Chris@87: will be so noted. One specifies record structure in Chris@87: one of four alternative ways, using an argument (as supplied to a dtype Chris@87: function keyword or a dtype object constructor itself). This Chris@87: argument must be one of the following: 1) string, 2) tuple, 3) list, or Chris@87: 4) dictionary. Each of these is briefly described below. Chris@87: Chris@87: 1) String argument (as used in the above examples). Chris@87: In this case, the constructor expects a comma-separated list of type Chris@87: specifiers, optionally with extra shape information. Chris@87: The type specifiers can take 4 different forms: :: Chris@87: Chris@87: a) b1, i1, i2, i4, i8, u1, u2, u4, u8, f2, f4, f8, c8, c16, a Chris@87: (representing bytes, ints, unsigned ints, floats, complex and Chris@87: fixed length strings of specified byte lengths) Chris@87: b) int8,...,uint8,...,float16, float32, float64, complex64, complex128 Chris@87: (this time with bit sizes) Chris@87: c) older Numeric/numarray type specifications (e.g. Float32). Chris@87: Don't use these in new code! Chris@87: d) Single character type specifiers (e.g H for unsigned short ints). Chris@87: Avoid using these unless you must. Details can be found in the Chris@87: Numpy book Chris@87: Chris@87: These different styles can be mixed within the same string (but why would you Chris@87: want to do that?). Furthermore, each type specifier can be prefixed Chris@87: with a repetition number, or a shape. In these cases an array Chris@87: element is created, i.e., an array within a record. That array Chris@87: is still referred to as a single field. An example: :: Chris@87: Chris@87: >>> x = np.zeros(3, dtype='3int8, float32, (2,3)float64') Chris@87: >>> x Chris@87: array([([0, 0, 0], 0.0, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]), Chris@87: ([0, 0, 0], 0.0, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]), Chris@87: ([0, 0, 0], 0.0, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]])], Chris@87: dtype=[('f0', '|i1', 3), ('f1', '>f4'), ('f2', '>f8', (2, 3))]) Chris@87: Chris@87: By using strings to define the record structure, it precludes being Chris@87: able to name the fields in the original definition. The names can Chris@87: be changed as shown later, however. Chris@87: Chris@87: 2) Tuple argument: The only relevant tuple case that applies to record Chris@87: structures is when a structure is mapped to an existing data type. This Chris@87: is done by pairing in a tuple, the existing data type with a matching Chris@87: dtype definition (using any of the variants being described here). As Chris@87: an example (using a definition using a list, so see 3) for further Chris@87: details): :: Chris@87: Chris@87: >>> x = np.zeros(3, dtype=('i4',[('r','u1'), ('g','u1'), ('b','u1'), ('a','u1')])) Chris@87: >>> x Chris@87: array([0, 0, 0]) Chris@87: >>> x['r'] Chris@87: array([0, 0, 0], dtype=uint8) Chris@87: Chris@87: In this case, an array is produced that looks and acts like a simple int32 array, Chris@87: but also has definitions for fields that use only one byte of the int32 (a bit Chris@87: like Fortran equivalencing). Chris@87: Chris@87: 3) List argument: In this case the record structure is defined with a list of Chris@87: tuples. Each tuple has 2 or 3 elements specifying: 1) The name of the field Chris@87: ('' is permitted), 2) the type of the field, and 3) the shape (optional). Chris@87: For example:: Chris@87: Chris@87: >>> x = np.zeros(3, dtype=[('x','f4'),('y',np.float32),('value','f4',(2,2))]) Chris@87: >>> x Chris@87: array([(0.0, 0.0, [[0.0, 0.0], [0.0, 0.0]]), Chris@87: (0.0, 0.0, [[0.0, 0.0], [0.0, 0.0]]), Chris@87: (0.0, 0.0, [[0.0, 0.0], [0.0, 0.0]])], Chris@87: dtype=[('x', '>f4'), ('y', '>f4'), ('value', '>f4', (2, 2))]) Chris@87: Chris@87: 4) Dictionary argument: two different forms are permitted. The first consists Chris@87: of a dictionary with two required keys ('names' and 'formats'), each having an Chris@87: equal sized list of values. The format list contains any type/shape specifier Chris@87: allowed in other contexts. The names must be strings. There are two optional Chris@87: keys: 'offsets' and 'titles'. Each must be a correspondingly matching list to Chris@87: the required two where offsets contain integer offsets for each field, and Chris@87: titles are objects containing metadata for each field (these do not have Chris@87: to be strings), where the value of None is permitted. As an example: :: Chris@87: Chris@87: >>> x = np.zeros(3, dtype={'names':['col1', 'col2'], 'formats':['i4','f4']}) Chris@87: >>> x Chris@87: array([(0, 0.0), (0, 0.0), (0, 0.0)], Chris@87: dtype=[('col1', '>i4'), ('col2', '>f4')]) Chris@87: Chris@87: The other dictionary form permitted is a dictionary of name keys with tuple Chris@87: values specifying type, offset, and an optional title. :: Chris@87: Chris@87: >>> x = np.zeros(3, dtype={'col1':('i1',0,'title 1'), 'col2':('f4',1,'title 2')}) Chris@87: >>> x Chris@87: array([(0, 0.0), (0, 0.0), (0, 0.0)], Chris@87: dtype=[(('title 1', 'col1'), '|i1'), (('title 2', 'col2'), '>f4')]) Chris@87: Chris@87: Accessing and modifying field names Chris@87: =================================== Chris@87: Chris@87: The field names are an attribute of the dtype object defining the record structure. Chris@87: For the last example: :: Chris@87: Chris@87: >>> x.dtype.names Chris@87: ('col1', 'col2') Chris@87: >>> x.dtype.names = ('x', 'y') Chris@87: >>> x Chris@87: array([(0, 0.0), (0, 0.0), (0, 0.0)], Chris@87: dtype=[(('title 1', 'x'), '|i1'), (('title 2', 'y'), '>f4')]) Chris@87: >>> x.dtype.names = ('x', 'y', 'z') # wrong number of names Chris@87: : must replace all names at once with a sequence of length 2 Chris@87: Chris@87: Accessing field titles Chris@87: ==================================== Chris@87: Chris@87: The field titles provide a standard place to put associated info for fields. Chris@87: They do not have to be strings. :: Chris@87: Chris@87: >>> x.dtype.fields['x'][2] Chris@87: 'title 1' Chris@87: Chris@87: Accessing multiple fields at once Chris@87: ==================================== Chris@87: Chris@87: You can access multiple fields at once using a list of field names: :: Chris@87: Chris@87: >>> x = np.array([(1.5,2.5,(1.0,2.0)),(3.,4.,(4.,5.)),(1.,3.,(2.,6.))], Chris@87: dtype=[('x','f4'),('y',np.float32),('value','f4',(2,2))]) Chris@87: Chris@87: Notice that `x` is created with a list of tuples. :: Chris@87: Chris@87: >>> x[['x','y']] Chris@87: array([(1.5, 2.5), (3.0, 4.0), (1.0, 3.0)], Chris@87: dtype=[('x', '>> x[['x','value']] Chris@87: array([(1.5, [[1.0, 2.0], [1.0, 2.0]]), (3.0, [[4.0, 5.0], [4.0, 5.0]]), Chris@87: (1.0, [[2.0, 6.0], [2.0, 6.0]])], Chris@87: dtype=[('x', '>> x[['y','x']] Chris@87: array([(2.5, 1.5), (4.0, 3.0), (3.0, 1.0)], Chris@87: dtype=[('y', '>> arr = np.zeros((5,), dtype=[('var1','f8'),('var2','f8')]) Chris@87: >>> arr['var1'] = np.arange(5) Chris@87: Chris@87: If you fill it in row by row, it takes a take a tuple Chris@87: (but not a list or array!):: Chris@87: Chris@87: >>> arr[0] = (10,20) Chris@87: >>> arr Chris@87: array([(10.0, 20.0), (1.0, 0.0), (2.0, 0.0), (3.0, 0.0), (4.0, 0.0)], Chris@87: dtype=[('var1', '`_. Chris@87: Chris@87: """ Chris@87: from __future__ import division, absolute_import, print_function