Chris@87
|
1 """
|
Chris@87
|
2 =====================================
|
Chris@87
|
3 Structured Arrays (and Record Arrays)
|
Chris@87
|
4 =====================================
|
Chris@87
|
5
|
Chris@87
|
6 Introduction
|
Chris@87
|
7 ============
|
Chris@87
|
8
|
Chris@87
|
9 Numpy provides powerful capabilities to create arrays of structs or records.
|
Chris@87
|
10 These arrays permit one to manipulate the data by the structs or by fields of
|
Chris@87
|
11 the struct. A simple example will show what is meant.: ::
|
Chris@87
|
12
|
Chris@87
|
13 >>> x = np.zeros((2,),dtype=('i4,f4,a10'))
|
Chris@87
|
14 >>> x[:] = [(1,2.,'Hello'),(2,3.,"World")]
|
Chris@87
|
15 >>> x
|
Chris@87
|
16 array([(1, 2.0, 'Hello'), (2, 3.0, 'World')],
|
Chris@87
|
17 dtype=[('f0', '>i4'), ('f1', '>f4'), ('f2', '|S10')])
|
Chris@87
|
18
|
Chris@87
|
19 Here we have created a one-dimensional array of length 2. Each element of
|
Chris@87
|
20 this array is a record that contains three items, a 32-bit integer, a 32-bit
|
Chris@87
|
21 float, and a string of length 10 or less. If we index this array at the second
|
Chris@87
|
22 position we get the second record: ::
|
Chris@87
|
23
|
Chris@87
|
24 >>> x[1]
|
Chris@87
|
25 (2,3.,"World")
|
Chris@87
|
26
|
Chris@87
|
27 Conveniently, one can access any field of the array by indexing using the
|
Chris@87
|
28 string that names that field. In this case the fields have received the
|
Chris@87
|
29 default names 'f0', 'f1' and 'f2'. ::
|
Chris@87
|
30
|
Chris@87
|
31 >>> y = x['f1']
|
Chris@87
|
32 >>> y
|
Chris@87
|
33 array([ 2., 3.], dtype=float32)
|
Chris@87
|
34 >>> y[:] = 2*y
|
Chris@87
|
35 >>> y
|
Chris@87
|
36 array([ 4., 6.], dtype=float32)
|
Chris@87
|
37 >>> x
|
Chris@87
|
38 array([(1, 4.0, 'Hello'), (2, 6.0, 'World')],
|
Chris@87
|
39 dtype=[('f0', '>i4'), ('f1', '>f4'), ('f2', '|S10')])
|
Chris@87
|
40
|
Chris@87
|
41 In these examples, y is a simple float array consisting of the 2nd field
|
Chris@87
|
42 in the record. But, rather than being a copy of the data in the structured
|
Chris@87
|
43 array, it is a view, i.e., it shares exactly the same memory locations.
|
Chris@87
|
44 Thus, when we updated this array by doubling its values, the structured
|
Chris@87
|
45 array shows the corresponding values as doubled as well. Likewise, if one
|
Chris@87
|
46 changes the record, the field view also changes: ::
|
Chris@87
|
47
|
Chris@87
|
48 >>> x[1] = (-1,-1.,"Master")
|
Chris@87
|
49 >>> x
|
Chris@87
|
50 array([(1, 4.0, 'Hello'), (-1, -1.0, 'Master')],
|
Chris@87
|
51 dtype=[('f0', '>i4'), ('f1', '>f4'), ('f2', '|S10')])
|
Chris@87
|
52 >>> y
|
Chris@87
|
53 array([ 4., -1.], dtype=float32)
|
Chris@87
|
54
|
Chris@87
|
55 Defining Structured Arrays
|
Chris@87
|
56 ==========================
|
Chris@87
|
57
|
Chris@87
|
58 One defines a structured array through the dtype object. There are
|
Chris@87
|
59 **several** alternative ways to define the fields of a record. Some of
|
Chris@87
|
60 these variants provide backward compatibility with Numeric, numarray, or
|
Chris@87
|
61 another module, and should not be used except for such purposes. These
|
Chris@87
|
62 will be so noted. One specifies record structure in
|
Chris@87
|
63 one of four alternative ways, using an argument (as supplied to a dtype
|
Chris@87
|
64 function keyword or a dtype object constructor itself). This
|
Chris@87
|
65 argument must be one of the following: 1) string, 2) tuple, 3) list, or
|
Chris@87
|
66 4) dictionary. Each of these is briefly described below.
|
Chris@87
|
67
|
Chris@87
|
68 1) String argument (as used in the above examples).
|
Chris@87
|
69 In this case, the constructor expects a comma-separated list of type
|
Chris@87
|
70 specifiers, optionally with extra shape information.
|
Chris@87
|
71 The type specifiers can take 4 different forms: ::
|
Chris@87
|
72
|
Chris@87
|
73 a) b1, i1, i2, i4, i8, u1, u2, u4, u8, f2, f4, f8, c8, c16, a<n>
|
Chris@87
|
74 (representing bytes, ints, unsigned ints, floats, complex and
|
Chris@87
|
75 fixed length strings of specified byte lengths)
|
Chris@87
|
76 b) int8,...,uint8,...,float16, float32, float64, complex64, complex128
|
Chris@87
|
77 (this time with bit sizes)
|
Chris@87
|
78 c) older Numeric/numarray type specifications (e.g. Float32).
|
Chris@87
|
79 Don't use these in new code!
|
Chris@87
|
80 d) Single character type specifiers (e.g H for unsigned short ints).
|
Chris@87
|
81 Avoid using these unless you must. Details can be found in the
|
Chris@87
|
82 Numpy book
|
Chris@87
|
83
|
Chris@87
|
84 These different styles can be mixed within the same string (but why would you
|
Chris@87
|
85 want to do that?). Furthermore, each type specifier can be prefixed
|
Chris@87
|
86 with a repetition number, or a shape. In these cases an array
|
Chris@87
|
87 element is created, i.e., an array within a record. That array
|
Chris@87
|
88 is still referred to as a single field. An example: ::
|
Chris@87
|
89
|
Chris@87
|
90 >>> x = np.zeros(3, dtype='3int8, float32, (2,3)float64')
|
Chris@87
|
91 >>> x
|
Chris@87
|
92 array([([0, 0, 0], 0.0, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]),
|
Chris@87
|
93 ([0, 0, 0], 0.0, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]),
|
Chris@87
|
94 ([0, 0, 0], 0.0, [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]])],
|
Chris@87
|
95 dtype=[('f0', '|i1', 3), ('f1', '>f4'), ('f2', '>f8', (2, 3))])
|
Chris@87
|
96
|
Chris@87
|
97 By using strings to define the record structure, it precludes being
|
Chris@87
|
98 able to name the fields in the original definition. The names can
|
Chris@87
|
99 be changed as shown later, however.
|
Chris@87
|
100
|
Chris@87
|
101 2) Tuple argument: The only relevant tuple case that applies to record
|
Chris@87
|
102 structures is when a structure is mapped to an existing data type. This
|
Chris@87
|
103 is done by pairing in a tuple, the existing data type with a matching
|
Chris@87
|
104 dtype definition (using any of the variants being described here). As
|
Chris@87
|
105 an example (using a definition using a list, so see 3) for further
|
Chris@87
|
106 details): ::
|
Chris@87
|
107
|
Chris@87
|
108 >>> x = np.zeros(3, dtype=('i4',[('r','u1'), ('g','u1'), ('b','u1'), ('a','u1')]))
|
Chris@87
|
109 >>> x
|
Chris@87
|
110 array([0, 0, 0])
|
Chris@87
|
111 >>> x['r']
|
Chris@87
|
112 array([0, 0, 0], dtype=uint8)
|
Chris@87
|
113
|
Chris@87
|
114 In this case, an array is produced that looks and acts like a simple int32 array,
|
Chris@87
|
115 but also has definitions for fields that use only one byte of the int32 (a bit
|
Chris@87
|
116 like Fortran equivalencing).
|
Chris@87
|
117
|
Chris@87
|
118 3) List argument: In this case the record structure is defined with a list of
|
Chris@87
|
119 tuples. Each tuple has 2 or 3 elements specifying: 1) The name of the field
|
Chris@87
|
120 ('' is permitted), 2) the type of the field, and 3) the shape (optional).
|
Chris@87
|
121 For example::
|
Chris@87
|
122
|
Chris@87
|
123 >>> x = np.zeros(3, dtype=[('x','f4'),('y',np.float32),('value','f4',(2,2))])
|
Chris@87
|
124 >>> x
|
Chris@87
|
125 array([(0.0, 0.0, [[0.0, 0.0], [0.0, 0.0]]),
|
Chris@87
|
126 (0.0, 0.0, [[0.0, 0.0], [0.0, 0.0]]),
|
Chris@87
|
127 (0.0, 0.0, [[0.0, 0.0], [0.0, 0.0]])],
|
Chris@87
|
128 dtype=[('x', '>f4'), ('y', '>f4'), ('value', '>f4', (2, 2))])
|
Chris@87
|
129
|
Chris@87
|
130 4) Dictionary argument: two different forms are permitted. The first consists
|
Chris@87
|
131 of a dictionary with two required keys ('names' and 'formats'), each having an
|
Chris@87
|
132 equal sized list of values. The format list contains any type/shape specifier
|
Chris@87
|
133 allowed in other contexts. The names must be strings. There are two optional
|
Chris@87
|
134 keys: 'offsets' and 'titles'. Each must be a correspondingly matching list to
|
Chris@87
|
135 the required two where offsets contain integer offsets for each field, and
|
Chris@87
|
136 titles are objects containing metadata for each field (these do not have
|
Chris@87
|
137 to be strings), where the value of None is permitted. As an example: ::
|
Chris@87
|
138
|
Chris@87
|
139 >>> x = np.zeros(3, dtype={'names':['col1', 'col2'], 'formats':['i4','f4']})
|
Chris@87
|
140 >>> x
|
Chris@87
|
141 array([(0, 0.0), (0, 0.0), (0, 0.0)],
|
Chris@87
|
142 dtype=[('col1', '>i4'), ('col2', '>f4')])
|
Chris@87
|
143
|
Chris@87
|
144 The other dictionary form permitted is a dictionary of name keys with tuple
|
Chris@87
|
145 values specifying type, offset, and an optional title. ::
|
Chris@87
|
146
|
Chris@87
|
147 >>> x = np.zeros(3, dtype={'col1':('i1',0,'title 1'), 'col2':('f4',1,'title 2')})
|
Chris@87
|
148 >>> x
|
Chris@87
|
149 array([(0, 0.0), (0, 0.0), (0, 0.0)],
|
Chris@87
|
150 dtype=[(('title 1', 'col1'), '|i1'), (('title 2', 'col2'), '>f4')])
|
Chris@87
|
151
|
Chris@87
|
152 Accessing and modifying field names
|
Chris@87
|
153 ===================================
|
Chris@87
|
154
|
Chris@87
|
155 The field names are an attribute of the dtype object defining the record structure.
|
Chris@87
|
156 For the last example: ::
|
Chris@87
|
157
|
Chris@87
|
158 >>> x.dtype.names
|
Chris@87
|
159 ('col1', 'col2')
|
Chris@87
|
160 >>> x.dtype.names = ('x', 'y')
|
Chris@87
|
161 >>> x
|
Chris@87
|
162 array([(0, 0.0), (0, 0.0), (0, 0.0)],
|
Chris@87
|
163 dtype=[(('title 1', 'x'), '|i1'), (('title 2', 'y'), '>f4')])
|
Chris@87
|
164 >>> x.dtype.names = ('x', 'y', 'z') # wrong number of names
|
Chris@87
|
165 <type 'exceptions.ValueError'>: must replace all names at once with a sequence of length 2
|
Chris@87
|
166
|
Chris@87
|
167 Accessing field titles
|
Chris@87
|
168 ====================================
|
Chris@87
|
169
|
Chris@87
|
170 The field titles provide a standard place to put associated info for fields.
|
Chris@87
|
171 They do not have to be strings. ::
|
Chris@87
|
172
|
Chris@87
|
173 >>> x.dtype.fields['x'][2]
|
Chris@87
|
174 'title 1'
|
Chris@87
|
175
|
Chris@87
|
176 Accessing multiple fields at once
|
Chris@87
|
177 ====================================
|
Chris@87
|
178
|
Chris@87
|
179 You can access multiple fields at once using a list of field names: ::
|
Chris@87
|
180
|
Chris@87
|
181 >>> x = np.array([(1.5,2.5,(1.0,2.0)),(3.,4.,(4.,5.)),(1.,3.,(2.,6.))],
|
Chris@87
|
182 dtype=[('x','f4'),('y',np.float32),('value','f4',(2,2))])
|
Chris@87
|
183
|
Chris@87
|
184 Notice that `x` is created with a list of tuples. ::
|
Chris@87
|
185
|
Chris@87
|
186 >>> x[['x','y']]
|
Chris@87
|
187 array([(1.5, 2.5), (3.0, 4.0), (1.0, 3.0)],
|
Chris@87
|
188 dtype=[('x', '<f4'), ('y', '<f4')])
|
Chris@87
|
189 >>> x[['x','value']]
|
Chris@87
|
190 array([(1.5, [[1.0, 2.0], [1.0, 2.0]]), (3.0, [[4.0, 5.0], [4.0, 5.0]]),
|
Chris@87
|
191 (1.0, [[2.0, 6.0], [2.0, 6.0]])],
|
Chris@87
|
192 dtype=[('x', '<f4'), ('value', '<f4', (2, 2))])
|
Chris@87
|
193
|
Chris@87
|
194 The fields are returned in the order they are asked for.::
|
Chris@87
|
195
|
Chris@87
|
196 >>> x[['y','x']]
|
Chris@87
|
197 array([(2.5, 1.5), (4.0, 3.0), (3.0, 1.0)],
|
Chris@87
|
198 dtype=[('y', '<f4'), ('x', '<f4')])
|
Chris@87
|
199
|
Chris@87
|
200 Filling structured arrays
|
Chris@87
|
201 =========================
|
Chris@87
|
202
|
Chris@87
|
203 Structured arrays can be filled by field or row by row. ::
|
Chris@87
|
204
|
Chris@87
|
205 >>> arr = np.zeros((5,), dtype=[('var1','f8'),('var2','f8')])
|
Chris@87
|
206 >>> arr['var1'] = np.arange(5)
|
Chris@87
|
207
|
Chris@87
|
208 If you fill it in row by row, it takes a take a tuple
|
Chris@87
|
209 (but not a list or array!)::
|
Chris@87
|
210
|
Chris@87
|
211 >>> arr[0] = (10,20)
|
Chris@87
|
212 >>> arr
|
Chris@87
|
213 array([(10.0, 20.0), (1.0, 0.0), (2.0, 0.0), (3.0, 0.0), (4.0, 0.0)],
|
Chris@87
|
214 dtype=[('var1', '<f8'), ('var2', '<f8')])
|
Chris@87
|
215
|
Chris@87
|
216 More information
|
Chris@87
|
217 ====================================
|
Chris@87
|
218 You can find some more information on recarrays and structured arrays
|
Chris@87
|
219 (including the difference between the two) `here
|
Chris@87
|
220 <http://www.scipy.org/Cookbook/Recarray>`_.
|
Chris@87
|
221
|
Chris@87
|
222 """
|
Chris@87
|
223 from __future__ import division, absolute_import, print_function
|