Chris@87
|
1 """
|
Chris@87
|
2
|
Chris@87
|
3 =============================
|
Chris@87
|
4 Byteswapping and byte order
|
Chris@87
|
5 =============================
|
Chris@87
|
6
|
Chris@87
|
7 Introduction to byte ordering and ndarrays
|
Chris@87
|
8 ==========================================
|
Chris@87
|
9
|
Chris@87
|
10 The ``ndarray`` is an object that provide a python array interface to data
|
Chris@87
|
11 in memory.
|
Chris@87
|
12
|
Chris@87
|
13 It often happens that the memory that you want to view with an array is
|
Chris@87
|
14 not of the same byte ordering as the computer on which you are running
|
Chris@87
|
15 Python.
|
Chris@87
|
16
|
Chris@87
|
17 For example, I might be working on a computer with a little-endian CPU -
|
Chris@87
|
18 such as an Intel Pentium, but I have loaded some data from a file
|
Chris@87
|
19 written by a computer that is big-endian. Let's say I have loaded 4
|
Chris@87
|
20 bytes from a file written by a Sun (big-endian) computer. I know that
|
Chris@87
|
21 these 4 bytes represent two 16-bit integers. On a big-endian machine, a
|
Chris@87
|
22 two-byte integer is stored with the Most Significant Byte (MSB) first,
|
Chris@87
|
23 and then the Least Significant Byte (LSB). Thus the bytes are, in memory order:
|
Chris@87
|
24
|
Chris@87
|
25 #. MSB integer 1
|
Chris@87
|
26 #. LSB integer 1
|
Chris@87
|
27 #. MSB integer 2
|
Chris@87
|
28 #. LSB integer 2
|
Chris@87
|
29
|
Chris@87
|
30 Let's say the two integers were in fact 1 and 770. Because 770 = 256 *
|
Chris@87
|
31 3 + 2, the 4 bytes in memory would contain respectively: 0, 1, 3, 2.
|
Chris@87
|
32 The bytes I have loaded from the file would have these contents:
|
Chris@87
|
33
|
Chris@87
|
34 >>> big_end_str = chr(0) + chr(1) + chr(3) + chr(2)
|
Chris@87
|
35 >>> big_end_str
|
Chris@87
|
36 '\\x00\\x01\\x03\\x02'
|
Chris@87
|
37
|
Chris@87
|
38 We might want to use an ``ndarray`` to access these integers. In that
|
Chris@87
|
39 case, we can create an array around this memory, and tell numpy that
|
Chris@87
|
40 there are two integers, and that they are 16 bit and big-endian:
|
Chris@87
|
41
|
Chris@87
|
42 >>> import numpy as np
|
Chris@87
|
43 >>> big_end_arr = np.ndarray(shape=(2,),dtype='>i2', buffer=big_end_str)
|
Chris@87
|
44 >>> big_end_arr[0]
|
Chris@87
|
45 1
|
Chris@87
|
46 >>> big_end_arr[1]
|
Chris@87
|
47 770
|
Chris@87
|
48
|
Chris@87
|
49 Note the array ``dtype`` above of ``>i2``. The ``>`` means 'big-endian'
|
Chris@87
|
50 (``<`` is little-endian) and ``i2`` means 'signed 2-byte integer'. For
|
Chris@87
|
51 example, if our data represented a single unsigned 4-byte little-endian
|
Chris@87
|
52 integer, the dtype string would be ``<u4``.
|
Chris@87
|
53
|
Chris@87
|
54 In fact, why don't we try that?
|
Chris@87
|
55
|
Chris@87
|
56 >>> little_end_u4 = np.ndarray(shape=(1,),dtype='<u4', buffer=big_end_str)
|
Chris@87
|
57 >>> little_end_u4[0] == 1 * 256**1 + 3 * 256**2 + 2 * 256**3
|
Chris@87
|
58 True
|
Chris@87
|
59
|
Chris@87
|
60 Returning to our ``big_end_arr`` - in this case our underlying data is
|
Chris@87
|
61 big-endian (data endianness) and we've set the dtype to match (the dtype
|
Chris@87
|
62 is also big-endian). However, sometimes you need to flip these around.
|
Chris@87
|
63
|
Chris@87
|
64 Changing byte ordering
|
Chris@87
|
65 ======================
|
Chris@87
|
66
|
Chris@87
|
67 As you can imagine from the introduction, there are two ways you can
|
Chris@87
|
68 affect the relationship between the byte ordering of the array and the
|
Chris@87
|
69 underlying memory it is looking at:
|
Chris@87
|
70
|
Chris@87
|
71 * Change the byte-ordering information in the array dtype so that it
|
Chris@87
|
72 interprets the undelying data as being in a different byte order.
|
Chris@87
|
73 This is the role of ``arr.newbyteorder()``
|
Chris@87
|
74 * Change the byte-ordering of the underlying data, leaving the dtype
|
Chris@87
|
75 interpretation as it was. This is what ``arr.byteswap()`` does.
|
Chris@87
|
76
|
Chris@87
|
77 The common situations in which you need to change byte ordering are:
|
Chris@87
|
78
|
Chris@87
|
79 #. Your data and dtype endianess don't match, and you want to change
|
Chris@87
|
80 the dtype so that it matches the data.
|
Chris@87
|
81 #. Your data and dtype endianess don't match, and you want to swap the
|
Chris@87
|
82 data so that they match the dtype
|
Chris@87
|
83 #. Your data and dtype endianess match, but you want the data swapped
|
Chris@87
|
84 and the dtype to reflect this
|
Chris@87
|
85
|
Chris@87
|
86 Data and dtype endianness don't match, change dtype to match data
|
Chris@87
|
87 -----------------------------------------------------------------
|
Chris@87
|
88
|
Chris@87
|
89 We make something where they don't match:
|
Chris@87
|
90
|
Chris@87
|
91 >>> wrong_end_dtype_arr = np.ndarray(shape=(2,),dtype='<i2', buffer=big_end_str)
|
Chris@87
|
92 >>> wrong_end_dtype_arr[0]
|
Chris@87
|
93 256
|
Chris@87
|
94
|
Chris@87
|
95 The obvious fix for this situation is to change the dtype so it gives
|
Chris@87
|
96 the correct endianness:
|
Chris@87
|
97
|
Chris@87
|
98 >>> fixed_end_dtype_arr = wrong_end_dtype_arr.newbyteorder()
|
Chris@87
|
99 >>> fixed_end_dtype_arr[0]
|
Chris@87
|
100 1
|
Chris@87
|
101
|
Chris@87
|
102 Note the the array has not changed in memory:
|
Chris@87
|
103
|
Chris@87
|
104 >>> fixed_end_dtype_arr.tobytes() == big_end_str
|
Chris@87
|
105 True
|
Chris@87
|
106
|
Chris@87
|
107 Data and type endianness don't match, change data to match dtype
|
Chris@87
|
108 ----------------------------------------------------------------
|
Chris@87
|
109
|
Chris@87
|
110 You might want to do this if you need the data in memory to be a certain
|
Chris@87
|
111 ordering. For example you might be writing the memory out to a file
|
Chris@87
|
112 that needs a certain byte ordering.
|
Chris@87
|
113
|
Chris@87
|
114 >>> fixed_end_mem_arr = wrong_end_dtype_arr.byteswap()
|
Chris@87
|
115 >>> fixed_end_mem_arr[0]
|
Chris@87
|
116 1
|
Chris@87
|
117
|
Chris@87
|
118 Now the array *has* changed in memory:
|
Chris@87
|
119
|
Chris@87
|
120 >>> fixed_end_mem_arr.tobytes() == big_end_str
|
Chris@87
|
121 False
|
Chris@87
|
122
|
Chris@87
|
123 Data and dtype endianness match, swap data and dtype
|
Chris@87
|
124 ----------------------------------------------------
|
Chris@87
|
125
|
Chris@87
|
126 You may have a correctly specified array dtype, but you need the array
|
Chris@87
|
127 to have the opposite byte order in memory, and you want the dtype to
|
Chris@87
|
128 match so the array values make sense. In this case you just do both of
|
Chris@87
|
129 the previous operations:
|
Chris@87
|
130
|
Chris@87
|
131 >>> swapped_end_arr = big_end_arr.byteswap().newbyteorder()
|
Chris@87
|
132 >>> swapped_end_arr[0]
|
Chris@87
|
133 1
|
Chris@87
|
134 >>> swapped_end_arr.tobytes() == big_end_str
|
Chris@87
|
135 False
|
Chris@87
|
136
|
Chris@87
|
137 An easier way of casting the data to a specific dtype and byte ordering
|
Chris@87
|
138 can be achieved with the ndarray astype method:
|
Chris@87
|
139
|
Chris@87
|
140 >>> swapped_end_arr = big_end_arr.astype('<i2')
|
Chris@87
|
141 >>> swapped_end_arr[0]
|
Chris@87
|
142 1
|
Chris@87
|
143 >>> swapped_end_arr.tobytes() == big_end_str
|
Chris@87
|
144 False
|
Chris@87
|
145
|
Chris@87
|
146 """
|
Chris@87
|
147 from __future__ import division, absolute_import, print_function
|