vamp-build-and-test: DEPENDENCIES/mingw32/Python27/Lib/site-packages/numpy/lib/format.py annotate

annotate DEPENDENCIES/mingw32/Python27/Lib/site-packages/numpy/lib/format.py @ 133:4acb5d8d80b6 tip

Don't fail environmental check if README.md exists (but .txt and no-suffix don't)

author	Chris Cannam
date	Tue, 30 Jul 2019 12:25:44 +0100
parents	2a2c65a20a8b
children

rev	line source
Chris@87	1 """
Chris@87	2 Define a simple format for saving numpy arrays to disk with the full
Chris@87	3 information about them.
Chris@87	4
Chris@87	5 The ``.npy`` format is the standard binary file format in NumPy for
Chris@87	6 persisting a single arbitrary NumPy array on disk. The format stores all
Chris@87	7 of the shape and dtype information necessary to reconstruct the array
Chris@87	8 correctly even on another machine with a different architecture.
Chris@87	9 The format is designed to be as simple as possible while achieving
Chris@87	10 its limited goals.
Chris@87	11
Chris@87	12 The ``.npz`` format is the standard format for persisting multiple NumPy
Chris@87	13 arrays on disk. A ``.npz`` file is a zip file containing multiple ``.npy``
Chris@87	14 files, one for each array.
Chris@87	15
Chris@87	16 Capabilities
Chris@87	17 ------------
Chris@87	18
Chris@87	19 - Can represent all NumPy arrays including nested record arrays and
Chris@87	20 object arrays.
Chris@87	21
Chris@87	22 - Represents the data in its native binary form.
Chris@87	23
Chris@87	24 - Supports Fortran-contiguous arrays directly.
Chris@87	25
Chris@87	26 - Stores all of the necessary information to reconstruct the array
Chris@87	27 including shape and dtype on a machine of a different
Chris@87	28 architecture. Both little-endian and big-endian arrays are
Chris@87	29 supported, and a file with little-endian numbers will yield
Chris@87	30 a little-endian array on any machine reading the file. The
Chris@87	31 types are described in terms of their actual sizes. For example,
Chris@87	32 if a machine with a 64-bit C "long int" writes out an array with
Chris@87	33 "long ints", a reading machine with 32-bit C "long ints" will yield
Chris@87	34 an array with 64-bit integers.
Chris@87	35
Chris@87	36 - Is straightforward to reverse engineer. Datasets often live longer than
Chris@87	37 the programs that created them. A competent developer should be
Chris@87	38 able to create a solution in his preferred programming language to
Chris@87	39 read most ``.npy`` files that he has been given without much
Chris@87	40 documentation.
Chris@87	41
Chris@87	42 - Allows memory-mapping of the data. See `open_memmep`.
Chris@87	43
Chris@87	44 - Can be read from a filelike stream object instead of an actual file.
Chris@87	45
Chris@87	46 - Stores object arrays, i.e. arrays containing elements that are arbitrary
Chris@87	47 Python objects. Files with object arrays are not to be mmapable, but
Chris@87	48 can be read and written to disk.
Chris@87	49
Chris@87	50 Limitations
Chris@87	51 -----------
Chris@87	52
Chris@87	53 - Arbitrary subclasses of numpy.ndarray are not completely preserved.
Chris@87	54 Subclasses will be accepted for writing, but only the array data will
Chris@87	55 be written out. A regular numpy.ndarray object will be created
Chris@87	56 upon reading the file.
Chris@87	57
Chris@87	58 .. warning::
Chris@87	59
Chris@87	60 Due to limitations in the interpretation of structured dtypes, dtypes
Chris@87	61 with fields with empty names will have the names replaced by 'f0', 'f1',
Chris@87	62 etc. Such arrays will not round-trip through the format entirely
Chris@87	63 accurately. The data is intact; only the field names will differ. We are
Chris@87	64 working on a fix for this. This fix will not require a change in the
Chris@87	65 file format. The arrays with such structures can still be saved and
Chris@87	66 restored, and the correct dtype may be restored by using the
Chris@87	67 ``loadedarray.view(correct_dtype)`` method.
Chris@87	68
Chris@87	69 File extensions
Chris@87	70 ---------------
Chris@87	71
Chris@87	72 We recommend using the ``.npy`` and ``.npz`` extensions for files saved
Chris@87	73 in this format. This is by no means a requirement; applications may wish
Chris@87	74 to use these file formats but use an extension specific to the
Chris@87	75 application. In the absence of an obvious alternative, however,
Chris@87	76 we suggest using ``.npy`` and ``.npz``.
Chris@87	77
Chris@87	78 Version numbering
Chris@87	79 -----------------
Chris@87	80
Chris@87	81 The version numbering of these formats is independent of NumPy version
Chris@87	82 numbering. If the format is upgraded, the code in `numpy.io` will still
Chris@87	83 be able to read and write Version 1.0 files.
Chris@87	84
Chris@87	85 Format Version 1.0
Chris@87	86 ------------------
Chris@87	87
Chris@87	88 The first 6 bytes are a magic string: exactly ``\\x93NUMPY``.
Chris@87	89
Chris@87	90 The next 1 byte is an unsigned byte: the major version number of the file
Chris@87	91 format, e.g. ``\\x01``.
Chris@87	92
Chris@87	93 The next 1 byte is an unsigned byte: the minor version number of the file
Chris@87	94 format, e.g. ``\\x00``. Note: the version of the file format is not tied
Chris@87	95 to the version of the numpy package.
Chris@87	96
Chris@87	97 The next 2 bytes form a little-endian unsigned short int: the length of
Chris@87	98 the header data HEADER_LEN.
Chris@87	99
Chris@87	100 The next HEADER_LEN bytes form the header data describing the array's
Chris@87	101 format. It is an ASCII string which contains a Python literal expression
Chris@87	102 of a dictionary. It is terminated by a newline (``\\n``) and padded with
Chris@87	103 spaces (``\\x20``) to make the total length of
Chris@87	104 ``magic string + 4 + HEADER_LEN`` be evenly divisible by 16 for alignment
Chris@87	105 purposes.
Chris@87	106
Chris@87	107 The dictionary contains three keys:
Chris@87	108
Chris@87	109 "descr" : dtype.descr
Chris@87	110 An object that can be passed as an argument to the `numpy.dtype`
Chris@87	111 constructor to create the array's dtype.
Chris@87	112 "fortran_order" : bool
Chris@87	113 Whether the array data is Fortran-contiguous or not. Since
Chris@87	114 Fortran-contiguous arrays are a common form of non-C-contiguity,
Chris@87	115 we allow them to be written directly to disk for efficiency.
Chris@87	116 "shape" : tuple of int
Chris@87	117 The shape of the array.
Chris@87	118
Chris@87	119 For repeatability and readability, the dictionary keys are sorted in
Chris@87	120 alphabetic order. This is for convenience only. A writer SHOULD implement
Chris@87	121 this if possible. A reader MUST NOT depend on this.
Chris@87	122
Chris@87	123 Following the header comes the array data. If the dtype contains Python
Chris@87	124 objects (i.e. ``dtype.hasobject is True``), then the data is a Python
Chris@87	125 pickle of the array. Otherwise the data is the contiguous (either C-
Chris@87	126 or Fortran-, depending on ``fortran_order``) bytes of the array.
Chris@87	127 Consumers can figure out the number of bytes by multiplying the number
Chris@87	128 of elements given by the shape (noting that ``shape=()`` means there is
Chris@87	129 1 element) by ``dtype.itemsize``.
Chris@87	130
Chris@87	131 Notes
Chris@87	132 -----
Chris@87	133 The ``.npy`` format, including reasons for creating it and a comparison of
Chris@87	134 alternatives, is described fully in the "npy-format" NEP.
Chris@87	135
Chris@87	136 """
Chris@87	137 from __future__ import division, absolute_import, print_function
Chris@87	138
Chris@87	139 import numpy
Chris@87	140 import sys
Chris@87	141 import io
Chris@87	142 import warnings
Chris@87	143 from numpy.lib.utils import safe_eval
Chris@87	144 from numpy.compat import asbytes, asstr, isfileobj, long, basestring
Chris@87	145
Chris@87	146 if sys.version_info[0] >= 3:
Chris@87	147 import pickle
Chris@87	148 else:
Chris@87	149 import cPickle as pickle
Chris@87	150
Chris@87	151 MAGIC_PREFIX = asbytes('\x93NUMPY')
Chris@87	152 MAGIC_LEN = len(MAGIC_PREFIX) + 2
Chris@87	153 BUFFER_SIZE = 2**18 # size of buffer for reading npz files in bytes
Chris@87	154
Chris@87	155 # difference between version 1.0 and 2.0 is a 4 byte (I) header length
Chris@87	156 # instead of 2 bytes (H) allowing storage of large structured arrays
Chris@87	157
Chris@87	158 def _check_version(version):
Chris@87	159 if version not in [(1, 0), (2, 0), None]:
Chris@87	160 msg = "we only support format version (1,0) and (2, 0), not %s"
Chris@87	161 raise ValueError(msg % (version,))
Chris@87	162
Chris@87	163 def magic(major, minor):
Chris@87	164 """ Return the magic string for the given file format version.
Chris@87	165
Chris@87	166 Parameters
Chris@87	167 ----------
Chris@87	168 major : int in [0, 255]
Chris@87	169 minor : int in [0, 255]
Chris@87	170
Chris@87	171 Returns
Chris@87	172 -------
Chris@87	173 magic : str
Chris@87	174
Chris@87	175 Raises
Chris@87	176 ------
Chris@87	177 ValueError if the version cannot be formatted.
Chris@87	178 """
Chris@87	179 if major < 0 or major > 255:
Chris@87	180 raise ValueError("major version must be 0 <= major < 256")
Chris@87	181 if minor < 0 or minor > 255:
Chris@87	182 raise ValueError("minor version must be 0 <= minor < 256")
Chris@87	183 if sys.version_info[0] < 3:
Chris@87	184 return MAGIC_PREFIX + chr(major) + chr(minor)
Chris@87	185 else:
Chris@87	186 return MAGIC_PREFIX + bytes([major, minor])
Chris@87	187
Chris@87	188 def read_magic(fp):
Chris@87	189 """ Read the magic string to get the version of the file format.
Chris@87	190
Chris@87	191 Parameters
Chris@87	192 ----------
Chris@87	193 fp : filelike object
Chris@87	194
Chris@87	195 Returns
Chris@87	196 -------
Chris@87	197 major : int
Chris@87	198 minor : int
Chris@87	199 """
Chris@87	200 magic_str = _read_bytes(fp, MAGIC_LEN, "magic string")
Chris@87	201 if magic_str[:-2] != MAGIC_PREFIX:
Chris@87	202 msg = "the magic string is not correct; expected %r, got %r"
Chris@87	203 raise ValueError(msg % (MAGIC_PREFIX, magic_str[:-2]))
Chris@87	204 if sys.version_info[0] < 3:
Chris@87	205 major, minor = map(ord, magic_str[-2:])
Chris@87	206 else:
Chris@87	207 major, minor = magic_str[-2:]
Chris@87	208 return major, minor
Chris@87	209
Chris@87	210 def dtype_to_descr(dtype):
Chris@87	211 """
Chris@87	212 Get a serializable descriptor from the dtype.
Chris@87	213
Chris@87	214 The .descr attribute of a dtype object cannot be round-tripped through
Chris@87	215 the dtype() constructor. Simple types, like dtype('float32'), have
Chris@87	216 a descr which looks like a record array with one field with '' as
Chris@87	217 a name. The dtype() constructor interprets this as a request to give
Chris@87	218 a default name. Instead, we construct descriptor that can be passed to
Chris@87	219 dtype().
Chris@87	220
Chris@87	221 Parameters
Chris@87	222 ----------
Chris@87	223 dtype : dtype
Chris@87	224 The dtype of the array that will be written to disk.
Chris@87	225
Chris@87	226 Returns
Chris@87	227 -------
Chris@87	228 descr : object
Chris@87	229 An object that can be passed to `numpy.dtype()` in order to
Chris@87	230 replicate the input dtype.
Chris@87	231
Chris@87	232 """
Chris@87	233 if dtype.names is not None:
Chris@87	234 # This is a record array. The .descr is fine. XXX: parts of the
Chris@87	235 # record array with an empty name, like padding bytes, still get
Chris@87	236 # fiddled with. This needs to be fixed in the C implementation of
Chris@87	237 # dtype().
Chris@87	238 return dtype.descr
Chris@87	239 else:
Chris@87	240 return dtype.str
Chris@87	241
Chris@87	242 def header_data_from_array_1_0(array):
Chris@87	243 """ Get the dictionary of header metadata from a numpy.ndarray.
Chris@87	244
Chris@87	245 Parameters
Chris@87	246 ----------
Chris@87	247 array : numpy.ndarray
Chris@87	248
Chris@87	249 Returns
Chris@87	250 -------
Chris@87	251 d : dict
Chris@87	252 This has the appropriate entries for writing its string representation
Chris@87	253 to the header of the file.
Chris@87	254 """
Chris@87	255 d = {}
Chris@87	256 d['shape'] = array.shape
Chris@87	257 if array.flags.c_contiguous:
Chris@87	258 d['fortran_order'] = False
Chris@87	259 elif array.flags.f_contiguous:
Chris@87	260 d['fortran_order'] = True
Chris@87	261 else:
Chris@87	262 # Totally non-contiguous data. We will have to make it C-contiguous
Chris@87	263 # before writing. Note that we need to test for C_CONTIGUOUS first
Chris@87	264 # because a 1-D array is both C_CONTIGUOUS and F_CONTIGUOUS.
Chris@87	265 d['fortran_order'] = False
Chris@87	266
Chris@87	267 d['descr'] = dtype_to_descr(array.dtype)
Chris@87	268 return d
Chris@87	269
Chris@87	270 def _write_array_header(fp, d, version=None):
Chris@87	271 """ Write the header for an array and returns the version used
Chris@87	272
Chris@87	273 Parameters
Chris@87	274 ----------
Chris@87	275 fp : filelike object
Chris@87	276 d : dict
Chris@87	277 This has the appropriate entries for writing its string representation
Chris@87	278 to the header of the file.
Chris@87	279 version: tuple or None
Chris@87	280 None means use oldest that works
Chris@87	281 explicit version will raise a ValueError if the format does not
Chris@87	282 allow saving this data. Default: None
Chris@87	283 Returns
Chris@87	284 -------
Chris@87	285 version : tuple of int
Chris@87	286 the file version which needs to be used to store the data
Chris@87	287 """
Chris@87	288 import struct
Chris@87	289 header = ["{"]
Chris@87	290 for key, value in sorted(d.items()):
Chris@87	291 # Need to use repr here, since we eval these when reading
Chris@87	292 header.append("'%s': %s, " % (key, repr(value)))
Chris@87	293 header.append("}")
Chris@87	294 header = "".join(header)
Chris@87	295 # Pad the header with spaces and a final newline such that the magic
Chris@87	296 # string, the header-length short and the header are aligned on a
Chris@87	297 # 16-byte boundary. Hopefully, some system, possibly memory-mapping,
Chris@87	298 # can take advantage of our premature optimization.
Chris@87	299 current_header_len = MAGIC_LEN + 2 + len(header) + 1 # 1 for the newline
Chris@87	300 topad = 16 - (current_header_len % 16)
Chris@87	301 header = header + ' '*topad + '\n'
Chris@87	302 header = asbytes(_filter_header(header))
Chris@87	303
Chris@87	304 if len(header) >= (256*256) and version == (1, 0):
Chris@87	305 raise ValueError("header does not fit inside %s bytes required by the"
Chris@87	306 " 1.0 format" % (256*256))
Chris@87	307 if len(header) < (256*256):
Chris@87	308 header_len_str = struct.pack('<H', len(header))
Chris@87	309 version = (1, 0)
Chris@87	310 elif len(header) < (2**32):
Chris@87	311 header_len_str = struct.pack('<I', len(header))
Chris@87	312 version = (2, 0)
Chris@87	313 else:
Chris@87	314 raise ValueError("header does not fit inside 4 GiB required by "
Chris@87	315 "the 2.0 format")
Chris@87	316
Chris@87	317 fp.write(magic(*version))
Chris@87	318 fp.write(header_len_str)
Chris@87	319 fp.write(header)
Chris@87	320 return version
Chris@87	321
Chris@87	322 def write_array_header_1_0(fp, d):
Chris@87	323 """ Write the header for an array using the 1.0 format.
Chris@87	324
Chris@87	325 Parameters
Chris@87	326 ----------
Chris@87	327 fp : filelike object
Chris@87	328 d : dict
Chris@87	329 This has the appropriate entries for writing its string
Chris@87	330 representation to the header of the file.
Chris@87	331 """
Chris@87	332 _write_array_header(fp, d, (1, 0))
Chris@87	333
Chris@87	334
Chris@87	335 def write_array_header_2_0(fp, d):
Chris@87	336 """ Write the header for an array using the 2.0 format.
Chris@87	337 The 2.0 format allows storing very large structured arrays.
Chris@87	338
Chris@87	339 .. versionadded:: 1.9.0
Chris@87	340
Chris@87	341 Parameters
Chris@87	342 ----------
Chris@87	343 fp : filelike object
Chris@87	344 d : dict
Chris@87	345 This has the appropriate entries for writing its string
Chris@87	346 representation to the header of the file.
Chris@87	347 """
Chris@87	348 _write_array_header(fp, d, (2, 0))
Chris@87	349
Chris@87	350 def read_array_header_1_0(fp):
Chris@87	351 """
Chris@87	352 Read an array header from a filelike object using the 1.0 file format
Chris@87	353 version.
Chris@87	354
Chris@87	355 This will leave the file object located just after the header.
Chris@87	356
Chris@87	357 Parameters
Chris@87	358 ----------
Chris@87	359 fp : filelike object
Chris@87	360 A file object or something with a `.read()` method like a file.
Chris@87	361
Chris@87	362 Returns
Chris@87	363 -------
Chris@87	364 shape : tuple of int
Chris@87	365 The shape of the array.
Chris@87	366 fortran_order : bool
Chris@87	367 The array data will be written out directly if it is either
Chris@87	368 C-contiguous or Fortran-contiguous. Otherwise, it will be made
Chris@87	369 contiguous before writing it out.
Chris@87	370 dtype : dtype
Chris@87	371 The dtype of the file's data.
Chris@87	372
Chris@87	373 Raises
Chris@87	374 ------
Chris@87	375 ValueError
Chris@87	376 If the data is invalid.
Chris@87	377
Chris@87	378 """
Chris@87	379 _read_array_header(fp, version=(1, 0))
Chris@87	380
Chris@87	381 def read_array_header_2_0(fp):
Chris@87	382 """
Chris@87	383 Read an array header from a filelike object using the 2.0 file format
Chris@87	384 version.
Chris@87	385
Chris@87	386 This will leave the file object located just after the header.
Chris@87	387
Chris@87	388 .. versionadded:: 1.9.0
Chris@87	389
Chris@87	390 Parameters
Chris@87	391 ----------
Chris@87	392 fp : filelike object
Chris@87	393 A file object or something with a `.read()` method like a file.
Chris@87	394
Chris@87	395 Returns
Chris@87	396 -------
Chris@87	397 shape : tuple of int
Chris@87	398 The shape of the array.
Chris@87	399 fortran_order : bool
Chris@87	400 The array data will be written out directly if it is either
Chris@87	401 C-contiguous or Fortran-contiguous. Otherwise, it will be made
Chris@87	402 contiguous before writing it out.
Chris@87	403 dtype : dtype
Chris@87	404 The dtype of the file's data.
Chris@87	405
Chris@87	406 Raises
Chris@87	407 ------
Chris@87	408 ValueError
Chris@87	409 If the data is invalid.
Chris@87	410
Chris@87	411 """
Chris@87	412 _read_array_header(fp, version=(2, 0))
Chris@87	413
Chris@87	414
Chris@87	415 def _filter_header(s):
Chris@87	416 """Clean up 'L' in npz header ints.
Chris@87	417
Chris@87	418 Cleans up the 'L' in strings representing integers. Needed to allow npz
Chris@87	419 headers produced in Python2 to be read in Python3.
Chris@87	420
Chris@87	421 Parameters
Chris@87	422 ----------
Chris@87	423 s : byte string
Chris@87	424 Npy file header.
Chris@87	425
Chris@87	426 Returns
Chris@87	427 -------
Chris@87	428 header : str
Chris@87	429 Cleaned up header.
Chris@87	430
Chris@87	431 """
Chris@87	432 import tokenize
Chris@87	433 if sys.version_info[0] >= 3:
Chris@87	434 from io import StringIO
Chris@87	435 else:
Chris@87	436 from StringIO import StringIO
Chris@87	437
Chris@87	438 tokens = []
Chris@87	439 last_token_was_number = False
Chris@87	440 for token in tokenize.generate_tokens(StringIO(asstr(s)).read):
Chris@87	441 token_type = token[0]
Chris@87	442 token_string = token[1]
Chris@87	443 if (last_token_was_number and
Chris@87	444 token_type == tokenize.NAME and
Chris@87	445 token_string == "L"):
Chris@87	446 continue
Chris@87	447 else:
Chris@87	448 tokens.append(token)
Chris@87	449 last_token_was_number = (token_type == tokenize.NUMBER)
Chris@87	450 return tokenize.untokenize(tokens)
Chris@87	451
Chris@87	452
Chris@87	453 def _read_array_header(fp, version):
Chris@87	454 """
Chris@87	455 see read_array_header_1_0
Chris@87	456 """
Chris@87	457 # Read an unsigned, little-endian short int which has the length of the
Chris@87	458 # header.
Chris@87	459 import struct
Chris@87	460 if version == (1, 0):
Chris@87	461 hlength_str = _read_bytes(fp, 2, "array header length")
Chris@87	462 header_length = struct.unpack('<H', hlength_str)[0]
Chris@87	463 header = _read_bytes(fp, header_length, "array header")
Chris@87	464 elif version == (2, 0):
Chris@87	465 hlength_str = _read_bytes(fp, 4, "array header length")
Chris@87	466 header_length = struct.unpack('<I', hlength_str)[0]
Chris@87	467 header = _read_bytes(fp, header_length, "array header")
Chris@87	468 else:
Chris@87	469 raise ValueError("Invalid version %r" % version)
Chris@87	470
Chris@87	471 # The header is a pretty-printed string representation of a literal
Chris@87	472 # Python dictionary with trailing newlines padded to a 16-byte
Chris@87	473 # boundary. The keys are strings.
Chris@87	474 # "shape" : tuple of int
Chris@87	475 # "fortran_order" : bool
Chris@87	476 # "descr" : dtype.descr
Chris@87	477 header = _filter_header(header)
Chris@87	478 try:
Chris@87	479 d = safe_eval(header)
Chris@87	480 except SyntaxError as e:
Chris@87	481 msg = "Cannot parse header: %r\nException: %r"
Chris@87	482 raise ValueError(msg % (header, e))
Chris@87	483 if not isinstance(d, dict):
Chris@87	484 msg = "Header is not a dictionary: %r"
Chris@87	485 raise ValueError(msg % d)
Chris@87	486 keys = sorted(d.keys())
Chris@87	487 if keys != ['descr', 'fortran_order', 'shape']:
Chris@87	488 msg = "Header does not contain the correct keys: %r"
Chris@87	489 raise ValueError(msg % (keys,))
Chris@87	490
Chris@87	491 # Sanity-check the values.
Chris@87	492 if (not isinstance(d['shape'], tuple) or
Chris@87	493 not numpy.all([isinstance(x, (int, long)) for x in d['shape']])):
Chris@87	494 msg = "shape is not valid: %r"
Chris@87	495 raise ValueError(msg % (d['shape'],))
Chris@87	496 if not isinstance(d['fortran_order'], bool):
Chris@87	497 msg = "fortran_order is not a valid bool: %r"
Chris@87	498 raise ValueError(msg % (d['fortran_order'],))
Chris@87	499 try:
Chris@87	500 dtype = numpy.dtype(d['descr'])
Chris@87	501 except TypeError as e:
Chris@87	502 msg = "descr is not a valid dtype descriptor: %r"
Chris@87	503 raise ValueError(msg % (d['descr'],))
Chris@87	504
Chris@87	505 return d['shape'], d['fortran_order'], dtype
Chris@87	506
Chris@87	507 def write_array(fp, array, version=None):
Chris@87	508 """
Chris@87	509 Write an array to an NPY file, including a header.
Chris@87	510
Chris@87	511 If the array is neither C-contiguous nor Fortran-contiguous AND the
Chris@87	512 file_like object is not a real file object, this function will have to
Chris@87	513 copy data in memory.
Chris@87	514
Chris@87	515 Parameters
Chris@87	516 ----------
Chris@87	517 fp : file_like object
Chris@87	518 An open, writable file object, or similar object with a
Chris@87	519 ``.write()`` method.
Chris@87	520 array : ndarray
Chris@87	521 The array to write to disk.
Chris@87	522 version : (int, int) or None, optional
Chris@87	523 The version number of the format. None means use the oldest
Chris@87	524 supported version that is able to store the data. Default: None
Chris@87	525
Chris@87	526 Raises
Chris@87	527 ------
Chris@87	528 ValueError
Chris@87	529 If the array cannot be persisted.
Chris@87	530 Various other errors
Chris@87	531 If the array contains Python objects as part of its dtype, the
Chris@87	532 process of pickling them may raise various errors if the objects
Chris@87	533 are not picklable.
Chris@87	534
Chris@87	535 """
Chris@87	536 _check_version(version)
Chris@87	537 used_ver = _write_array_header(fp, header_data_from_array_1_0(array),
Chris@87	538 version)
Chris@87	539 # this warning can be removed when 1.9 has aged enough
Chris@87	540 if version != (2, 0) and used_ver == (2, 0):
Chris@87	541 warnings.warn("Stored array in format 2.0. It can only be"
Chris@87	542 "read by NumPy >= 1.9", UserWarning)
Chris@87	543
Chris@87	544 # Set buffer size to 16 MiB to hide the Python loop overhead.
Chris@87	545 buffersize = max(16 * 1024 ** 2 // array.itemsize, 1)
Chris@87	546
Chris@87	547 if array.dtype.hasobject:
Chris@87	548 # We contain Python objects so we cannot write out the data
Chris@87	549 # directly. Instead, we will pickle it out with version 2 of the
Chris@87	550 # pickle protocol.
Chris@87	551 pickle.dump(array, fp, protocol=2)
Chris@87	552 elif array.flags.f_contiguous and not array.flags.c_contiguous:
Chris@87	553 if isfileobj(fp):
Chris@87	554 array.T.tofile(fp)
Chris@87	555 else:
Chris@87	556 for chunk in numpy.nditer(
Chris@87	557 array, flags=['external_loop', 'buffered', 'zerosize_ok'],
Chris@87	558 buffersize=buffersize, order='F'):
Chris@87	559 fp.write(chunk.tobytes('C'))
Chris@87	560 else:
Chris@87	561 if isfileobj(fp):
Chris@87	562 array.tofile(fp)
Chris@87	563 else:
Chris@87	564 for chunk in numpy.nditer(
Chris@87	565 array, flags=['external_loop', 'buffered', 'zerosize_ok'],
Chris@87	566 buffersize=buffersize, order='C'):
Chris@87	567 fp.write(chunk.tobytes('C'))
Chris@87	568
Chris@87	569
Chris@87	570 def read_array(fp):
Chris@87	571 """
Chris@87	572 Read an array from an NPY file.
Chris@87	573
Chris@87	574 Parameters
Chris@87	575 ----------
Chris@87	576 fp : file_like object
Chris@87	577 If this is not a real file object, then this may take extra memory
Chris@87	578 and time.
Chris@87	579
Chris@87	580 Returns
Chris@87	581 -------
Chris@87	582 array : ndarray
Chris@87	583 The array from the data on disk.
Chris@87	584
Chris@87	585 Raises
Chris@87	586 ------
Chris@87	587 ValueError
Chris@87	588 If the data is invalid.
Chris@87	589
Chris@87	590 """
Chris@87	591 version = read_magic(fp)
Chris@87	592 _check_version(version)
Chris@87	593 shape, fortran_order, dtype = _read_array_header(fp, version)
Chris@87	594 if len(shape) == 0:
Chris@87	595 count = 1
Chris@87	596 else:
Chris@87	597 count = numpy.multiply.reduce(shape)
Chris@87	598
Chris@87	599 # Now read the actual data.
Chris@87	600 if dtype.hasobject:
Chris@87	601 # The array contained Python objects. We need to unpickle the data.
Chris@87	602 array = pickle.load(fp)
Chris@87	603 else:
Chris@87	604 if isfileobj(fp):
Chris@87	605 # We can use the fast fromfile() function.
Chris@87	606 array = numpy.fromfile(fp, dtype=dtype, count=count)
Chris@87	607 else:
Chris@87	608 # This is not a real file. We have to read it the
Chris@87	609 # memory-intensive way.
Chris@87	610 # crc32 module fails on reads greater than 2 ** 32 bytes,
Chris@87	611 # breaking large reads from gzip streams. Chunk reads to
Chris@87	612 # BUFFER_SIZE bytes to avoid issue and reduce memory overhead
Chris@87	613 # of the read. In non-chunked case count < max_read_count, so
Chris@87	614 # only one read is performed.
Chris@87	615
Chris@87	616 max_read_count = BUFFER_SIZE // min(BUFFER_SIZE, dtype.itemsize)
Chris@87	617
Chris@87	618 array = numpy.empty(count, dtype=dtype)
Chris@87	619 for i in range(0, count, max_read_count):
Chris@87	620 read_count = min(max_read_count, count - i)
Chris@87	621 read_size = int(read_count * dtype.itemsize)
Chris@87	622 data = _read_bytes(fp, read_size, "array data")
Chris@87	623 array[i:i+read_count] = numpy.frombuffer(data, dtype=dtype,
Chris@87	624 count=read_count)
Chris@87	625
Chris@87	626 if fortran_order:
Chris@87	627 array.shape = shape[::-1]
Chris@87	628 array = array.transpose()
Chris@87	629 else:
Chris@87	630 array.shape = shape
Chris@87	631
Chris@87	632 return array
Chris@87	633
Chris@87	634
Chris@87	635 def open_memmap(filename, mode='r+', dtype=None, shape=None,
Chris@87	636 fortran_order=False, version=None):
Chris@87	637 """
Chris@87	638 Open a .npy file as a memory-mapped array.
Chris@87	639
Chris@87	640 This may be used to read an existing file or create a new one.
Chris@87	641
Chris@87	642 Parameters
Chris@87	643 ----------
Chris@87	644 filename : str
Chris@87	645 The name of the file on disk. This may not be a file-like
Chris@87	646 object.
Chris@87	647 mode : str, optional
Chris@87	648 The mode in which to open the file; the default is 'r+'. In
Chris@87	649 addition to the standard file modes, 'c' is also accepted to mean
Chris@87	650 "copy on write." See `memmap` for the available mode strings.
Chris@87	651 dtype : data-type, optional
Chris@87	652 The data type of the array if we are creating a new file in "write"
Chris@87	653 mode, if not, `dtype` is ignored. The default value is None, which
Chris@87	654 results in a data-type of `float64`.
Chris@87	655 shape : tuple of int
Chris@87	656 The shape of the array if we are creating a new file in "write"
Chris@87	657 mode, in which case this parameter is required. Otherwise, this
Chris@87	658 parameter is ignored and is thus optional.
Chris@87	659 fortran_order : bool, optional
Chris@87	660 Whether the array should be Fortran-contiguous (True) or
Chris@87	661 C-contiguous (False, the default) if we are creating a new file in
Chris@87	662 "write" mode.
Chris@87	663 version : tuple of int (major, minor) or None
Chris@87	664 If the mode is a "write" mode, then this is the version of the file
Chris@87	665 format used to create the file. None means use the oldest
Chris@87	666 supported version that is able to store the data. Default: None
Chris@87	667
Chris@87	668 Returns
Chris@87	669 -------
Chris@87	670 marray : memmap
Chris@87	671 The memory-mapped array.
Chris@87	672
Chris@87	673 Raises
Chris@87	674 ------
Chris@87	675 ValueError
Chris@87	676 If the data or the mode is invalid.
Chris@87	677 IOError
Chris@87	678 If the file is not found or cannot be opened correctly.
Chris@87	679
Chris@87	680 See Also
Chris@87	681 --------
Chris@87	682 memmap
Chris@87	683
Chris@87	684 """
Chris@87	685 if not isinstance(filename, basestring):
Chris@87	686 raise ValueError("Filename must be a string. Memmap cannot use"
Chris@87	687 " existing file handles.")
Chris@87	688
Chris@87	689 if 'w' in mode:
Chris@87	690 # We are creating the file, not reading it.
Chris@87	691 # Check if we ought to create the file.
Chris@87	692 _check_version(version)
Chris@87	693 # Ensure that the given dtype is an authentic dtype object rather
Chris@87	694 # than just something that can be interpreted as a dtype object.
Chris@87	695 dtype = numpy.dtype(dtype)
Chris@87	696 if dtype.hasobject:
Chris@87	697 msg = "Array can't be memory-mapped: Python objects in dtype."
Chris@87	698 raise ValueError(msg)
Chris@87	699 d = dict(
Chris@87	700 descr=dtype_to_descr(dtype),
Chris@87	701 fortran_order=fortran_order,
Chris@87	702 shape=shape,
Chris@87	703 )
Chris@87	704 # If we got here, then it should be safe to create the file.
Chris@87	705 fp = open(filename, mode+'b')
Chris@87	706 try:
Chris@87	707 used_ver = _write_array_header(fp, d, version)
Chris@87	708 # this warning can be removed when 1.9 has aged enough
Chris@87	709 if version != (2, 0) and used_ver == (2, 0):
Chris@87	710 warnings.warn("Stored array in format 2.0. It can only be"
Chris@87	711 "read by NumPy >= 1.9", UserWarning)
Chris@87	712 offset = fp.tell()
Chris@87	713 finally:
Chris@87	714 fp.close()
Chris@87	715 else:
Chris@87	716 # Read the header of the file first.
Chris@87	717 fp = open(filename, 'rb')
Chris@87	718 try:
Chris@87	719 version = read_magic(fp)
Chris@87	720 _check_version(version)
Chris@87	721
Chris@87	722 shape, fortran_order, dtype = _read_array_header(fp, version)
Chris@87	723 if dtype.hasobject:
Chris@87	724 msg = "Array can't be memory-mapped: Python objects in dtype."
Chris@87	725 raise ValueError(msg)
Chris@87	726 offset = fp.tell()
Chris@87	727 finally:
Chris@87	728 fp.close()
Chris@87	729
Chris@87	730 if fortran_order:
Chris@87	731 order = 'F'
Chris@87	732 else:
Chris@87	733 order = 'C'
Chris@87	734
Chris@87	735 # We need to change a write-only mode to a read-write mode since we've
Chris@87	736 # already written data to the file.
Chris@87	737 if mode == 'w+':
Chris@87	738 mode = 'r+'
Chris@87	739
Chris@87	740 marray = numpy.memmap(filename, dtype=dtype, shape=shape, order=order,
Chris@87	741 mode=mode, offset=offset)
Chris@87	742
Chris@87	743 return marray
Chris@87	744
Chris@87	745
Chris@87	746 def _read_bytes(fp, size, error_template="ran out of data"):
Chris@87	747 """
Chris@87	748 Read from file-like object until size bytes are read.
Chris@87	749 Raises ValueError if not EOF is encountered before size bytes are read.
Chris@87	750 Non-blocking objects only supported if they derive from io objects.
Chris@87	751
Chris@87	752 Required as e.g. ZipExtFile in python 2.6 can return less data than
Chris@87	753 requested.
Chris@87	754 """
Chris@87	755 data = bytes()
Chris@87	756 while True:
Chris@87	757 # io files (default in python3) return None or raise on
Chris@87	758 # would-block, python2 file will truncate, probably nothing can be
Chris@87	759 # done about that. note that regular files can't be non-blocking
Chris@87	760 try:
Chris@87	761 r = fp.read(size - len(data))
Chris@87	762 data += r
Chris@87	763 if len(r) == 0 or len(data) == size:
Chris@87	764 break
Chris@87	765 except io.BlockingIOError:
Chris@87	766 pass
Chris@87	767 if len(data) != size:
Chris@87	768 msg = "EOF: reading %s, expected %d bytes got %d"
Chris@87	769 raise ValueError(msg % (error_template, size, len(data)))
Chris@87	770 else:
Chris@87	771 return data

Mercurial > hg > vamp-build-and-test

annotate DEPENDENCIES/mingw32/Python27/Lib/site-packages/numpy/lib/format.py @ 133:4acb5d8d80b6 tip