Mercurial > hg > vamp-build-and-test
comparison DEPENDENCIES/mingw32/Python27/Lib/site-packages/numpy/lib/format.py @ 87:2a2c65a20a8b
Add Python libs and headers
author | Chris Cannam |
---|---|
date | Wed, 25 Feb 2015 14:05:22 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
86:413a9d26189e | 87:2a2c65a20a8b |
---|---|
1 """ | |
2 Define a simple format for saving numpy arrays to disk with the full | |
3 information about them. | |
4 | |
5 The ``.npy`` format is the standard binary file format in NumPy for | |
6 persisting a *single* arbitrary NumPy array on disk. The format stores all | |
7 of the shape and dtype information necessary to reconstruct the array | |
8 correctly even on another machine with a different architecture. | |
9 The format is designed to be as simple as possible while achieving | |
10 its limited goals. | |
11 | |
12 The ``.npz`` format is the standard format for persisting *multiple* NumPy | |
13 arrays on disk. A ``.npz`` file is a zip file containing multiple ``.npy`` | |
14 files, one for each array. | |
15 | |
16 Capabilities | |
17 ------------ | |
18 | |
19 - Can represent all NumPy arrays including nested record arrays and | |
20 object arrays. | |
21 | |
22 - Represents the data in its native binary form. | |
23 | |
24 - Supports Fortran-contiguous arrays directly. | |
25 | |
26 - Stores all of the necessary information to reconstruct the array | |
27 including shape and dtype on a machine of a different | |
28 architecture. Both little-endian and big-endian arrays are | |
29 supported, and a file with little-endian numbers will yield | |
30 a little-endian array on any machine reading the file. The | |
31 types are described in terms of their actual sizes. For example, | |
32 if a machine with a 64-bit C "long int" writes out an array with | |
33 "long ints", a reading machine with 32-bit C "long ints" will yield | |
34 an array with 64-bit integers. | |
35 | |
36 - Is straightforward to reverse engineer. Datasets often live longer than | |
37 the programs that created them. A competent developer should be | |
38 able to create a solution in his preferred programming language to | |
39 read most ``.npy`` files that he has been given without much | |
40 documentation. | |
41 | |
42 - Allows memory-mapping of the data. See `open_memmep`. | |
43 | |
44 - Can be read from a filelike stream object instead of an actual file. | |
45 | |
46 - Stores object arrays, i.e. arrays containing elements that are arbitrary | |
47 Python objects. Files with object arrays are not to be mmapable, but | |
48 can be read and written to disk. | |
49 | |
50 Limitations | |
51 ----------- | |
52 | |
53 - Arbitrary subclasses of numpy.ndarray are not completely preserved. | |
54 Subclasses will be accepted for writing, but only the array data will | |
55 be written out. A regular numpy.ndarray object will be created | |
56 upon reading the file. | |
57 | |
58 .. warning:: | |
59 | |
60 Due to limitations in the interpretation of structured dtypes, dtypes | |
61 with fields with empty names will have the names replaced by 'f0', 'f1', | |
62 etc. Such arrays will not round-trip through the format entirely | |
63 accurately. The data is intact; only the field names will differ. We are | |
64 working on a fix for this. This fix will not require a change in the | |
65 file format. The arrays with such structures can still be saved and | |
66 restored, and the correct dtype may be restored by using the | |
67 ``loadedarray.view(correct_dtype)`` method. | |
68 | |
69 File extensions | |
70 --------------- | |
71 | |
72 We recommend using the ``.npy`` and ``.npz`` extensions for files saved | |
73 in this format. This is by no means a requirement; applications may wish | |
74 to use these file formats but use an extension specific to the | |
75 application. In the absence of an obvious alternative, however, | |
76 we suggest using ``.npy`` and ``.npz``. | |
77 | |
78 Version numbering | |
79 ----------------- | |
80 | |
81 The version numbering of these formats is independent of NumPy version | |
82 numbering. If the format is upgraded, the code in `numpy.io` will still | |
83 be able to read and write Version 1.0 files. | |
84 | |
85 Format Version 1.0 | |
86 ------------------ | |
87 | |
88 The first 6 bytes are a magic string: exactly ``\\x93NUMPY``. | |
89 | |
90 The next 1 byte is an unsigned byte: the major version number of the file | |
91 format, e.g. ``\\x01``. | |
92 | |
93 The next 1 byte is an unsigned byte: the minor version number of the file | |
94 format, e.g. ``\\x00``. Note: the version of the file format is not tied | |
95 to the version of the numpy package. | |
96 | |
97 The next 2 bytes form a little-endian unsigned short int: the length of | |
98 the header data HEADER_LEN. | |
99 | |
100 The next HEADER_LEN bytes form the header data describing the array's | |
101 format. It is an ASCII string which contains a Python literal expression | |
102 of a dictionary. It is terminated by a newline (``\\n``) and padded with | |
103 spaces (``\\x20``) to make the total length of | |
104 ``magic string + 4 + HEADER_LEN`` be evenly divisible by 16 for alignment | |
105 purposes. | |
106 | |
107 The dictionary contains three keys: | |
108 | |
109 "descr" : dtype.descr | |
110 An object that can be passed as an argument to the `numpy.dtype` | |
111 constructor to create the array's dtype. | |
112 "fortran_order" : bool | |
113 Whether the array data is Fortran-contiguous or not. Since | |
114 Fortran-contiguous arrays are a common form of non-C-contiguity, | |
115 we allow them to be written directly to disk for efficiency. | |
116 "shape" : tuple of int | |
117 The shape of the array. | |
118 | |
119 For repeatability and readability, the dictionary keys are sorted in | |
120 alphabetic order. This is for convenience only. A writer SHOULD implement | |
121 this if possible. A reader MUST NOT depend on this. | |
122 | |
123 Following the header comes the array data. If the dtype contains Python | |
124 objects (i.e. ``dtype.hasobject is True``), then the data is a Python | |
125 pickle of the array. Otherwise the data is the contiguous (either C- | |
126 or Fortran-, depending on ``fortran_order``) bytes of the array. | |
127 Consumers can figure out the number of bytes by multiplying the number | |
128 of elements given by the shape (noting that ``shape=()`` means there is | |
129 1 element) by ``dtype.itemsize``. | |
130 | |
131 Notes | |
132 ----- | |
133 The ``.npy`` format, including reasons for creating it and a comparison of | |
134 alternatives, is described fully in the "npy-format" NEP. | |
135 | |
136 """ | |
137 from __future__ import division, absolute_import, print_function | |
138 | |
139 import numpy | |
140 import sys | |
141 import io | |
142 import warnings | |
143 from numpy.lib.utils import safe_eval | |
144 from numpy.compat import asbytes, asstr, isfileobj, long, basestring | |
145 | |
146 if sys.version_info[0] >= 3: | |
147 import pickle | |
148 else: | |
149 import cPickle as pickle | |
150 | |
151 MAGIC_PREFIX = asbytes('\x93NUMPY') | |
152 MAGIC_LEN = len(MAGIC_PREFIX) + 2 | |
153 BUFFER_SIZE = 2**18 # size of buffer for reading npz files in bytes | |
154 | |
155 # difference between version 1.0 and 2.0 is a 4 byte (I) header length | |
156 # instead of 2 bytes (H) allowing storage of large structured arrays | |
157 | |
158 def _check_version(version): | |
159 if version not in [(1, 0), (2, 0), None]: | |
160 msg = "we only support format version (1,0) and (2, 0), not %s" | |
161 raise ValueError(msg % (version,)) | |
162 | |
163 def magic(major, minor): | |
164 """ Return the magic string for the given file format version. | |
165 | |
166 Parameters | |
167 ---------- | |
168 major : int in [0, 255] | |
169 minor : int in [0, 255] | |
170 | |
171 Returns | |
172 ------- | |
173 magic : str | |
174 | |
175 Raises | |
176 ------ | |
177 ValueError if the version cannot be formatted. | |
178 """ | |
179 if major < 0 or major > 255: | |
180 raise ValueError("major version must be 0 <= major < 256") | |
181 if minor < 0 or minor > 255: | |
182 raise ValueError("minor version must be 0 <= minor < 256") | |
183 if sys.version_info[0] < 3: | |
184 return MAGIC_PREFIX + chr(major) + chr(minor) | |
185 else: | |
186 return MAGIC_PREFIX + bytes([major, minor]) | |
187 | |
188 def read_magic(fp): | |
189 """ Read the magic string to get the version of the file format. | |
190 | |
191 Parameters | |
192 ---------- | |
193 fp : filelike object | |
194 | |
195 Returns | |
196 ------- | |
197 major : int | |
198 minor : int | |
199 """ | |
200 magic_str = _read_bytes(fp, MAGIC_LEN, "magic string") | |
201 if magic_str[:-2] != MAGIC_PREFIX: | |
202 msg = "the magic string is not correct; expected %r, got %r" | |
203 raise ValueError(msg % (MAGIC_PREFIX, magic_str[:-2])) | |
204 if sys.version_info[0] < 3: | |
205 major, minor = map(ord, magic_str[-2:]) | |
206 else: | |
207 major, minor = magic_str[-2:] | |
208 return major, minor | |
209 | |
210 def dtype_to_descr(dtype): | |
211 """ | |
212 Get a serializable descriptor from the dtype. | |
213 | |
214 The .descr attribute of a dtype object cannot be round-tripped through | |
215 the dtype() constructor. Simple types, like dtype('float32'), have | |
216 a descr which looks like a record array with one field with '' as | |
217 a name. The dtype() constructor interprets this as a request to give | |
218 a default name. Instead, we construct descriptor that can be passed to | |
219 dtype(). | |
220 | |
221 Parameters | |
222 ---------- | |
223 dtype : dtype | |
224 The dtype of the array that will be written to disk. | |
225 | |
226 Returns | |
227 ------- | |
228 descr : object | |
229 An object that can be passed to `numpy.dtype()` in order to | |
230 replicate the input dtype. | |
231 | |
232 """ | |
233 if dtype.names is not None: | |
234 # This is a record array. The .descr is fine. XXX: parts of the | |
235 # record array with an empty name, like padding bytes, still get | |
236 # fiddled with. This needs to be fixed in the C implementation of | |
237 # dtype(). | |
238 return dtype.descr | |
239 else: | |
240 return dtype.str | |
241 | |
242 def header_data_from_array_1_0(array): | |
243 """ Get the dictionary of header metadata from a numpy.ndarray. | |
244 | |
245 Parameters | |
246 ---------- | |
247 array : numpy.ndarray | |
248 | |
249 Returns | |
250 ------- | |
251 d : dict | |
252 This has the appropriate entries for writing its string representation | |
253 to the header of the file. | |
254 """ | |
255 d = {} | |
256 d['shape'] = array.shape | |
257 if array.flags.c_contiguous: | |
258 d['fortran_order'] = False | |
259 elif array.flags.f_contiguous: | |
260 d['fortran_order'] = True | |
261 else: | |
262 # Totally non-contiguous data. We will have to make it C-contiguous | |
263 # before writing. Note that we need to test for C_CONTIGUOUS first | |
264 # because a 1-D array is both C_CONTIGUOUS and F_CONTIGUOUS. | |
265 d['fortran_order'] = False | |
266 | |
267 d['descr'] = dtype_to_descr(array.dtype) | |
268 return d | |
269 | |
270 def _write_array_header(fp, d, version=None): | |
271 """ Write the header for an array and returns the version used | |
272 | |
273 Parameters | |
274 ---------- | |
275 fp : filelike object | |
276 d : dict | |
277 This has the appropriate entries for writing its string representation | |
278 to the header of the file. | |
279 version: tuple or None | |
280 None means use oldest that works | |
281 explicit version will raise a ValueError if the format does not | |
282 allow saving this data. Default: None | |
283 Returns | |
284 ------- | |
285 version : tuple of int | |
286 the file version which needs to be used to store the data | |
287 """ | |
288 import struct | |
289 header = ["{"] | |
290 for key, value in sorted(d.items()): | |
291 # Need to use repr here, since we eval these when reading | |
292 header.append("'%s': %s, " % (key, repr(value))) | |
293 header.append("}") | |
294 header = "".join(header) | |
295 # Pad the header with spaces and a final newline such that the magic | |
296 # string, the header-length short and the header are aligned on a | |
297 # 16-byte boundary. Hopefully, some system, possibly memory-mapping, | |
298 # can take advantage of our premature optimization. | |
299 current_header_len = MAGIC_LEN + 2 + len(header) + 1 # 1 for the newline | |
300 topad = 16 - (current_header_len % 16) | |
301 header = header + ' '*topad + '\n' | |
302 header = asbytes(_filter_header(header)) | |
303 | |
304 if len(header) >= (256*256) and version == (1, 0): | |
305 raise ValueError("header does not fit inside %s bytes required by the" | |
306 " 1.0 format" % (256*256)) | |
307 if len(header) < (256*256): | |
308 header_len_str = struct.pack('<H', len(header)) | |
309 version = (1, 0) | |
310 elif len(header) < (2**32): | |
311 header_len_str = struct.pack('<I', len(header)) | |
312 version = (2, 0) | |
313 else: | |
314 raise ValueError("header does not fit inside 4 GiB required by " | |
315 "the 2.0 format") | |
316 | |
317 fp.write(magic(*version)) | |
318 fp.write(header_len_str) | |
319 fp.write(header) | |
320 return version | |
321 | |
322 def write_array_header_1_0(fp, d): | |
323 """ Write the header for an array using the 1.0 format. | |
324 | |
325 Parameters | |
326 ---------- | |
327 fp : filelike object | |
328 d : dict | |
329 This has the appropriate entries for writing its string | |
330 representation to the header of the file. | |
331 """ | |
332 _write_array_header(fp, d, (1, 0)) | |
333 | |
334 | |
335 def write_array_header_2_0(fp, d): | |
336 """ Write the header for an array using the 2.0 format. | |
337 The 2.0 format allows storing very large structured arrays. | |
338 | |
339 .. versionadded:: 1.9.0 | |
340 | |
341 Parameters | |
342 ---------- | |
343 fp : filelike object | |
344 d : dict | |
345 This has the appropriate entries for writing its string | |
346 representation to the header of the file. | |
347 """ | |
348 _write_array_header(fp, d, (2, 0)) | |
349 | |
350 def read_array_header_1_0(fp): | |
351 """ | |
352 Read an array header from a filelike object using the 1.0 file format | |
353 version. | |
354 | |
355 This will leave the file object located just after the header. | |
356 | |
357 Parameters | |
358 ---------- | |
359 fp : filelike object | |
360 A file object or something with a `.read()` method like a file. | |
361 | |
362 Returns | |
363 ------- | |
364 shape : tuple of int | |
365 The shape of the array. | |
366 fortran_order : bool | |
367 The array data will be written out directly if it is either | |
368 C-contiguous or Fortran-contiguous. Otherwise, it will be made | |
369 contiguous before writing it out. | |
370 dtype : dtype | |
371 The dtype of the file's data. | |
372 | |
373 Raises | |
374 ------ | |
375 ValueError | |
376 If the data is invalid. | |
377 | |
378 """ | |
379 _read_array_header(fp, version=(1, 0)) | |
380 | |
381 def read_array_header_2_0(fp): | |
382 """ | |
383 Read an array header from a filelike object using the 2.0 file format | |
384 version. | |
385 | |
386 This will leave the file object located just after the header. | |
387 | |
388 .. versionadded:: 1.9.0 | |
389 | |
390 Parameters | |
391 ---------- | |
392 fp : filelike object | |
393 A file object or something with a `.read()` method like a file. | |
394 | |
395 Returns | |
396 ------- | |
397 shape : tuple of int | |
398 The shape of the array. | |
399 fortran_order : bool | |
400 The array data will be written out directly if it is either | |
401 C-contiguous or Fortran-contiguous. Otherwise, it will be made | |
402 contiguous before writing it out. | |
403 dtype : dtype | |
404 The dtype of the file's data. | |
405 | |
406 Raises | |
407 ------ | |
408 ValueError | |
409 If the data is invalid. | |
410 | |
411 """ | |
412 _read_array_header(fp, version=(2, 0)) | |
413 | |
414 | |
415 def _filter_header(s): | |
416 """Clean up 'L' in npz header ints. | |
417 | |
418 Cleans up the 'L' in strings representing integers. Needed to allow npz | |
419 headers produced in Python2 to be read in Python3. | |
420 | |
421 Parameters | |
422 ---------- | |
423 s : byte string | |
424 Npy file header. | |
425 | |
426 Returns | |
427 ------- | |
428 header : str | |
429 Cleaned up header. | |
430 | |
431 """ | |
432 import tokenize | |
433 if sys.version_info[0] >= 3: | |
434 from io import StringIO | |
435 else: | |
436 from StringIO import StringIO | |
437 | |
438 tokens = [] | |
439 last_token_was_number = False | |
440 for token in tokenize.generate_tokens(StringIO(asstr(s)).read): | |
441 token_type = token[0] | |
442 token_string = token[1] | |
443 if (last_token_was_number and | |
444 token_type == tokenize.NAME and | |
445 token_string == "L"): | |
446 continue | |
447 else: | |
448 tokens.append(token) | |
449 last_token_was_number = (token_type == tokenize.NUMBER) | |
450 return tokenize.untokenize(tokens) | |
451 | |
452 | |
453 def _read_array_header(fp, version): | |
454 """ | |
455 see read_array_header_1_0 | |
456 """ | |
457 # Read an unsigned, little-endian short int which has the length of the | |
458 # header. | |
459 import struct | |
460 if version == (1, 0): | |
461 hlength_str = _read_bytes(fp, 2, "array header length") | |
462 header_length = struct.unpack('<H', hlength_str)[0] | |
463 header = _read_bytes(fp, header_length, "array header") | |
464 elif version == (2, 0): | |
465 hlength_str = _read_bytes(fp, 4, "array header length") | |
466 header_length = struct.unpack('<I', hlength_str)[0] | |
467 header = _read_bytes(fp, header_length, "array header") | |
468 else: | |
469 raise ValueError("Invalid version %r" % version) | |
470 | |
471 # The header is a pretty-printed string representation of a literal | |
472 # Python dictionary with trailing newlines padded to a 16-byte | |
473 # boundary. The keys are strings. | |
474 # "shape" : tuple of int | |
475 # "fortran_order" : bool | |
476 # "descr" : dtype.descr | |
477 header = _filter_header(header) | |
478 try: | |
479 d = safe_eval(header) | |
480 except SyntaxError as e: | |
481 msg = "Cannot parse header: %r\nException: %r" | |
482 raise ValueError(msg % (header, e)) | |
483 if not isinstance(d, dict): | |
484 msg = "Header is not a dictionary: %r" | |
485 raise ValueError(msg % d) | |
486 keys = sorted(d.keys()) | |
487 if keys != ['descr', 'fortran_order', 'shape']: | |
488 msg = "Header does not contain the correct keys: %r" | |
489 raise ValueError(msg % (keys,)) | |
490 | |
491 # Sanity-check the values. | |
492 if (not isinstance(d['shape'], tuple) or | |
493 not numpy.all([isinstance(x, (int, long)) for x in d['shape']])): | |
494 msg = "shape is not valid: %r" | |
495 raise ValueError(msg % (d['shape'],)) | |
496 if not isinstance(d['fortran_order'], bool): | |
497 msg = "fortran_order is not a valid bool: %r" | |
498 raise ValueError(msg % (d['fortran_order'],)) | |
499 try: | |
500 dtype = numpy.dtype(d['descr']) | |
501 except TypeError as e: | |
502 msg = "descr is not a valid dtype descriptor: %r" | |
503 raise ValueError(msg % (d['descr'],)) | |
504 | |
505 return d['shape'], d['fortran_order'], dtype | |
506 | |
507 def write_array(fp, array, version=None): | |
508 """ | |
509 Write an array to an NPY file, including a header. | |
510 | |
511 If the array is neither C-contiguous nor Fortran-contiguous AND the | |
512 file_like object is not a real file object, this function will have to | |
513 copy data in memory. | |
514 | |
515 Parameters | |
516 ---------- | |
517 fp : file_like object | |
518 An open, writable file object, or similar object with a | |
519 ``.write()`` method. | |
520 array : ndarray | |
521 The array to write to disk. | |
522 version : (int, int) or None, optional | |
523 The version number of the format. None means use the oldest | |
524 supported version that is able to store the data. Default: None | |
525 | |
526 Raises | |
527 ------ | |
528 ValueError | |
529 If the array cannot be persisted. | |
530 Various other errors | |
531 If the array contains Python objects as part of its dtype, the | |
532 process of pickling them may raise various errors if the objects | |
533 are not picklable. | |
534 | |
535 """ | |
536 _check_version(version) | |
537 used_ver = _write_array_header(fp, header_data_from_array_1_0(array), | |
538 version) | |
539 # this warning can be removed when 1.9 has aged enough | |
540 if version != (2, 0) and used_ver == (2, 0): | |
541 warnings.warn("Stored array in format 2.0. It can only be" | |
542 "read by NumPy >= 1.9", UserWarning) | |
543 | |
544 # Set buffer size to 16 MiB to hide the Python loop overhead. | |
545 buffersize = max(16 * 1024 ** 2 // array.itemsize, 1) | |
546 | |
547 if array.dtype.hasobject: | |
548 # We contain Python objects so we cannot write out the data | |
549 # directly. Instead, we will pickle it out with version 2 of the | |
550 # pickle protocol. | |
551 pickle.dump(array, fp, protocol=2) | |
552 elif array.flags.f_contiguous and not array.flags.c_contiguous: | |
553 if isfileobj(fp): | |
554 array.T.tofile(fp) | |
555 else: | |
556 for chunk in numpy.nditer( | |
557 array, flags=['external_loop', 'buffered', 'zerosize_ok'], | |
558 buffersize=buffersize, order='F'): | |
559 fp.write(chunk.tobytes('C')) | |
560 else: | |
561 if isfileobj(fp): | |
562 array.tofile(fp) | |
563 else: | |
564 for chunk in numpy.nditer( | |
565 array, flags=['external_loop', 'buffered', 'zerosize_ok'], | |
566 buffersize=buffersize, order='C'): | |
567 fp.write(chunk.tobytes('C')) | |
568 | |
569 | |
570 def read_array(fp): | |
571 """ | |
572 Read an array from an NPY file. | |
573 | |
574 Parameters | |
575 ---------- | |
576 fp : file_like object | |
577 If this is not a real file object, then this may take extra memory | |
578 and time. | |
579 | |
580 Returns | |
581 ------- | |
582 array : ndarray | |
583 The array from the data on disk. | |
584 | |
585 Raises | |
586 ------ | |
587 ValueError | |
588 If the data is invalid. | |
589 | |
590 """ | |
591 version = read_magic(fp) | |
592 _check_version(version) | |
593 shape, fortran_order, dtype = _read_array_header(fp, version) | |
594 if len(shape) == 0: | |
595 count = 1 | |
596 else: | |
597 count = numpy.multiply.reduce(shape) | |
598 | |
599 # Now read the actual data. | |
600 if dtype.hasobject: | |
601 # The array contained Python objects. We need to unpickle the data. | |
602 array = pickle.load(fp) | |
603 else: | |
604 if isfileobj(fp): | |
605 # We can use the fast fromfile() function. | |
606 array = numpy.fromfile(fp, dtype=dtype, count=count) | |
607 else: | |
608 # This is not a real file. We have to read it the | |
609 # memory-intensive way. | |
610 # crc32 module fails on reads greater than 2 ** 32 bytes, | |
611 # breaking large reads from gzip streams. Chunk reads to | |
612 # BUFFER_SIZE bytes to avoid issue and reduce memory overhead | |
613 # of the read. In non-chunked case count < max_read_count, so | |
614 # only one read is performed. | |
615 | |
616 max_read_count = BUFFER_SIZE // min(BUFFER_SIZE, dtype.itemsize) | |
617 | |
618 array = numpy.empty(count, dtype=dtype) | |
619 for i in range(0, count, max_read_count): | |
620 read_count = min(max_read_count, count - i) | |
621 read_size = int(read_count * dtype.itemsize) | |
622 data = _read_bytes(fp, read_size, "array data") | |
623 array[i:i+read_count] = numpy.frombuffer(data, dtype=dtype, | |
624 count=read_count) | |
625 | |
626 if fortran_order: | |
627 array.shape = shape[::-1] | |
628 array = array.transpose() | |
629 else: | |
630 array.shape = shape | |
631 | |
632 return array | |
633 | |
634 | |
635 def open_memmap(filename, mode='r+', dtype=None, shape=None, | |
636 fortran_order=False, version=None): | |
637 """ | |
638 Open a .npy file as a memory-mapped array. | |
639 | |
640 This may be used to read an existing file or create a new one. | |
641 | |
642 Parameters | |
643 ---------- | |
644 filename : str | |
645 The name of the file on disk. This may *not* be a file-like | |
646 object. | |
647 mode : str, optional | |
648 The mode in which to open the file; the default is 'r+'. In | |
649 addition to the standard file modes, 'c' is also accepted to mean | |
650 "copy on write." See `memmap` for the available mode strings. | |
651 dtype : data-type, optional | |
652 The data type of the array if we are creating a new file in "write" | |
653 mode, if not, `dtype` is ignored. The default value is None, which | |
654 results in a data-type of `float64`. | |
655 shape : tuple of int | |
656 The shape of the array if we are creating a new file in "write" | |
657 mode, in which case this parameter is required. Otherwise, this | |
658 parameter is ignored and is thus optional. | |
659 fortran_order : bool, optional | |
660 Whether the array should be Fortran-contiguous (True) or | |
661 C-contiguous (False, the default) if we are creating a new file in | |
662 "write" mode. | |
663 version : tuple of int (major, minor) or None | |
664 If the mode is a "write" mode, then this is the version of the file | |
665 format used to create the file. None means use the oldest | |
666 supported version that is able to store the data. Default: None | |
667 | |
668 Returns | |
669 ------- | |
670 marray : memmap | |
671 The memory-mapped array. | |
672 | |
673 Raises | |
674 ------ | |
675 ValueError | |
676 If the data or the mode is invalid. | |
677 IOError | |
678 If the file is not found or cannot be opened correctly. | |
679 | |
680 See Also | |
681 -------- | |
682 memmap | |
683 | |
684 """ | |
685 if not isinstance(filename, basestring): | |
686 raise ValueError("Filename must be a string. Memmap cannot use" | |
687 " existing file handles.") | |
688 | |
689 if 'w' in mode: | |
690 # We are creating the file, not reading it. | |
691 # Check if we ought to create the file. | |
692 _check_version(version) | |
693 # Ensure that the given dtype is an authentic dtype object rather | |
694 # than just something that can be interpreted as a dtype object. | |
695 dtype = numpy.dtype(dtype) | |
696 if dtype.hasobject: | |
697 msg = "Array can't be memory-mapped: Python objects in dtype." | |
698 raise ValueError(msg) | |
699 d = dict( | |
700 descr=dtype_to_descr(dtype), | |
701 fortran_order=fortran_order, | |
702 shape=shape, | |
703 ) | |
704 # If we got here, then it should be safe to create the file. | |
705 fp = open(filename, mode+'b') | |
706 try: | |
707 used_ver = _write_array_header(fp, d, version) | |
708 # this warning can be removed when 1.9 has aged enough | |
709 if version != (2, 0) and used_ver == (2, 0): | |
710 warnings.warn("Stored array in format 2.0. It can only be" | |
711 "read by NumPy >= 1.9", UserWarning) | |
712 offset = fp.tell() | |
713 finally: | |
714 fp.close() | |
715 else: | |
716 # Read the header of the file first. | |
717 fp = open(filename, 'rb') | |
718 try: | |
719 version = read_magic(fp) | |
720 _check_version(version) | |
721 | |
722 shape, fortran_order, dtype = _read_array_header(fp, version) | |
723 if dtype.hasobject: | |
724 msg = "Array can't be memory-mapped: Python objects in dtype." | |
725 raise ValueError(msg) | |
726 offset = fp.tell() | |
727 finally: | |
728 fp.close() | |
729 | |
730 if fortran_order: | |
731 order = 'F' | |
732 else: | |
733 order = 'C' | |
734 | |
735 # We need to change a write-only mode to a read-write mode since we've | |
736 # already written data to the file. | |
737 if mode == 'w+': | |
738 mode = 'r+' | |
739 | |
740 marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order, | |
741 mode=mode, offset=offset) | |
742 | |
743 return marray | |
744 | |
745 | |
746 def _read_bytes(fp, size, error_template="ran out of data"): | |
747 """ | |
748 Read from file-like object until size bytes are read. | |
749 Raises ValueError if not EOF is encountered before size bytes are read. | |
750 Non-blocking objects only supported if they derive from io objects. | |
751 | |
752 Required as e.g. ZipExtFile in python 2.6 can return less data than | |
753 requested. | |
754 """ | |
755 data = bytes() | |
756 while True: | |
757 # io files (default in python3) return None or raise on | |
758 # would-block, python2 file will truncate, probably nothing can be | |
759 # done about that. note that regular files can't be non-blocking | |
760 try: | |
761 r = fp.read(size - len(data)) | |
762 data += r | |
763 if len(r) == 0 or len(data) == size: | |
764 break | |
765 except io.BlockingIOError: | |
766 pass | |
767 if len(data) != size: | |
768 msg = "EOF: reading %s, expected %d bytes got %d" | |
769 raise ValueError(msg % (error_template, size, len(data))) | |
770 else: | |
771 return data |