Chris@87: """
Chris@87: 
Chris@87: =============================
Chris@87:  Byteswapping and byte order
Chris@87: =============================
Chris@87: 
Chris@87: Introduction to byte ordering and ndarrays
Chris@87: ==========================================
Chris@87: 
Chris@87: The ``ndarray`` is an object that provide a python array interface to data
Chris@87: in memory.
Chris@87: 
Chris@87: It often happens that the memory that you want to view with an array is
Chris@87: not of the same byte ordering as the computer on which you are running
Chris@87: Python.
Chris@87: 
Chris@87: For example, I might be working on a computer with a little-endian CPU -
Chris@87: such as an Intel Pentium, but I have loaded some data from a file
Chris@87: written by a computer that is big-endian.  Let's say I have loaded 4
Chris@87: bytes from a file written by a Sun (big-endian) computer.  I know that
Chris@87: these 4 bytes represent two 16-bit integers.  On a big-endian machine, a
Chris@87: two-byte integer is stored with the Most Significant Byte (MSB) first,
Chris@87: and then the Least Significant Byte (LSB). Thus the bytes are, in memory order:
Chris@87: 
Chris@87: #. MSB integer 1
Chris@87: #. LSB integer 1
Chris@87: #. MSB integer 2
Chris@87: #. LSB integer 2
Chris@87: 
Chris@87: Let's say the two integers were in fact 1 and 770.  Because 770 = 256 *
Chris@87: 3 + 2, the 4 bytes in memory would contain respectively: 0, 1, 3, 2.
Chris@87: The bytes I have loaded from the file would have these contents:
Chris@87: 
Chris@87: >>> big_end_str = chr(0) + chr(1) + chr(3) + chr(2)
Chris@87: >>> big_end_str
Chris@87: '\\x00\\x01\\x03\\x02'
Chris@87: 
Chris@87: We might want to use an ``ndarray`` to access these integers.  In that
Chris@87: case, we can create an array around this memory, and tell numpy that
Chris@87: there are two integers, and that they are 16 bit and big-endian:
Chris@87: 
Chris@87: >>> import numpy as np
Chris@87: >>> big_end_arr = np.ndarray(shape=(2,),dtype='>i2', buffer=big_end_str)
Chris@87: >>> big_end_arr[0]
Chris@87: 1
Chris@87: >>> big_end_arr[1]
Chris@87: 770
Chris@87: 
Chris@87: Note the array ``dtype`` above of ``>i2``.  The ``>`` means 'big-endian'
Chris@87: (``<`` is little-endian) and ``i2`` means 'signed 2-byte integer'.  For
Chris@87: example, if our data represented a single unsigned 4-byte little-endian
Chris@87: integer, the dtype string would be ``<u4``.
Chris@87: 
Chris@87: In fact, why don't we try that?
Chris@87: 
Chris@87: >>> little_end_u4 = np.ndarray(shape=(1,),dtype='<u4', buffer=big_end_str)
Chris@87: >>> little_end_u4[0] == 1 * 256**1 + 3 * 256**2 + 2 * 256**3
Chris@87: True
Chris@87: 
Chris@87: Returning to our ``big_end_arr`` - in this case our underlying data is
Chris@87: big-endian (data endianness) and we've set the dtype to match (the dtype
Chris@87: is also big-endian).  However, sometimes you need to flip these around.
Chris@87: 
Chris@87: Changing byte ordering
Chris@87: ======================
Chris@87: 
Chris@87: As you can imagine from the introduction, there are two ways you can
Chris@87: affect the relationship between the byte ordering of the array and the
Chris@87: underlying memory it is looking at:
Chris@87: 
Chris@87: * Change the byte-ordering information in the array dtype so that it
Chris@87:   interprets the undelying data as being in a different byte order.
Chris@87:   This is the role of ``arr.newbyteorder()``
Chris@87: * Change the byte-ordering of the underlying data, leaving the dtype
Chris@87:   interpretation as it was.  This is what ``arr.byteswap()`` does.
Chris@87: 
Chris@87: The common situations in which you need to change byte ordering are:
Chris@87: 
Chris@87: #. Your data and dtype endianess don't match, and you want to change
Chris@87:    the dtype so that it matches the data.
Chris@87: #. Your data and dtype endianess don't match, and you want to swap the
Chris@87:    data so that they match the dtype
Chris@87: #. Your data and dtype endianess match, but you want the data swapped
Chris@87:    and the dtype to reflect this
Chris@87: 
Chris@87: Data and dtype endianness don't match, change dtype to match data
Chris@87: -----------------------------------------------------------------
Chris@87: 
Chris@87: We make something where they don't match:
Chris@87: 
Chris@87: >>> wrong_end_dtype_arr = np.ndarray(shape=(2,),dtype='<i2', buffer=big_end_str)
Chris@87: >>> wrong_end_dtype_arr[0]
Chris@87: 256
Chris@87: 
Chris@87: The obvious fix for this situation is to change the dtype so it gives
Chris@87: the correct endianness:
Chris@87: 
Chris@87: >>> fixed_end_dtype_arr = wrong_end_dtype_arr.newbyteorder()
Chris@87: >>> fixed_end_dtype_arr[0]
Chris@87: 1
Chris@87: 
Chris@87: Note the the array has not changed in memory:
Chris@87: 
Chris@87: >>> fixed_end_dtype_arr.tobytes() == big_end_str
Chris@87: True
Chris@87: 
Chris@87: Data and type endianness don't match, change data to match dtype
Chris@87: ----------------------------------------------------------------
Chris@87: 
Chris@87: You might want to do this if you need the data in memory to be a certain
Chris@87: ordering.  For example you might be writing the memory out to a file
Chris@87: that needs a certain byte ordering.
Chris@87: 
Chris@87: >>> fixed_end_mem_arr = wrong_end_dtype_arr.byteswap()
Chris@87: >>> fixed_end_mem_arr[0]
Chris@87: 1
Chris@87: 
Chris@87: Now the array *has* changed in memory:
Chris@87: 
Chris@87: >>> fixed_end_mem_arr.tobytes() == big_end_str
Chris@87: False
Chris@87: 
Chris@87: Data and dtype endianness match, swap data and dtype
Chris@87: ----------------------------------------------------
Chris@87: 
Chris@87: You may have a correctly specified array dtype, but you need the array
Chris@87: to have the opposite byte order in memory, and you want the dtype to
Chris@87: match so the array values make sense.  In this case you just do both of
Chris@87: the previous operations:
Chris@87: 
Chris@87: >>> swapped_end_arr = big_end_arr.byteswap().newbyteorder()
Chris@87: >>> swapped_end_arr[0]
Chris@87: 1
Chris@87: >>> swapped_end_arr.tobytes() == big_end_str
Chris@87: False
Chris@87: 
Chris@87: An easier way of casting the data to a specific dtype and byte ordering
Chris@87: can be achieved with the ndarray astype method:
Chris@87: 
Chris@87: >>> swapped_end_arr = big_end_arr.astype('<i2')
Chris@87: >>> swapped_end_arr[0]
Chris@87: 1
Chris@87: >>> swapped_end_arr.tobytes() == big_end_str
Chris@87: False
Chris@87: 
Chris@87: """
Chris@87: from __future__ import division, absolute_import, print_function