cannam@89
|
1 .PU
|
cannam@89
|
2 .TH bzip2 1
|
cannam@89
|
3 .SH NAME
|
cannam@89
|
4 bzip2, bunzip2 \- a block-sorting file compressor, v1.0.6
|
cannam@89
|
5 .br
|
cannam@89
|
6 bzcat \- decompresses files to stdout
|
cannam@89
|
7 .br
|
cannam@89
|
8 bzip2recover \- recovers data from damaged bzip2 files
|
cannam@89
|
9
|
cannam@89
|
10 .SH SYNOPSIS
|
cannam@89
|
11 .ll +8
|
cannam@89
|
12 .B bzip2
|
cannam@89
|
13 .RB [ " \-cdfkqstvzVL123456789 " ]
|
cannam@89
|
14 [
|
cannam@89
|
15 .I "filenames \&..."
|
cannam@89
|
16 ]
|
cannam@89
|
17 .ll -8
|
cannam@89
|
18 .br
|
cannam@89
|
19 .B bunzip2
|
cannam@89
|
20 .RB [ " \-fkvsVL " ]
|
cannam@89
|
21 [
|
cannam@89
|
22 .I "filenames \&..."
|
cannam@89
|
23 ]
|
cannam@89
|
24 .br
|
cannam@89
|
25 .B bzcat
|
cannam@89
|
26 .RB [ " \-s " ]
|
cannam@89
|
27 [
|
cannam@89
|
28 .I "filenames \&..."
|
cannam@89
|
29 ]
|
cannam@89
|
30 .br
|
cannam@89
|
31 .B bzip2recover
|
cannam@89
|
32 .I "filename"
|
cannam@89
|
33
|
cannam@89
|
34 .SH DESCRIPTION
|
cannam@89
|
35 .I bzip2
|
cannam@89
|
36 compresses files using the Burrows-Wheeler block sorting
|
cannam@89
|
37 text compression algorithm, and Huffman coding. Compression is
|
cannam@89
|
38 generally considerably better than that achieved by more conventional
|
cannam@89
|
39 LZ77/LZ78-based compressors, and approaches the performance of the PPM
|
cannam@89
|
40 family of statistical compressors.
|
cannam@89
|
41
|
cannam@89
|
42 The command-line options are deliberately very similar to
|
cannam@89
|
43 those of
|
cannam@89
|
44 .I GNU gzip,
|
cannam@89
|
45 but they are not identical.
|
cannam@89
|
46
|
cannam@89
|
47 .I bzip2
|
cannam@89
|
48 expects a list of file names to accompany the
|
cannam@89
|
49 command-line flags. Each file is replaced by a compressed version of
|
cannam@89
|
50 itself, with the name "original_name.bz2".
|
cannam@89
|
51 Each compressed file
|
cannam@89
|
52 has the same modification date, permissions, and, when possible,
|
cannam@89
|
53 ownership as the corresponding original, so that these properties can
|
cannam@89
|
54 be correctly restored at decompression time. File name handling is
|
cannam@89
|
55 naive in the sense that there is no mechanism for preserving original
|
cannam@89
|
56 file names, permissions, ownerships or dates in filesystems which lack
|
cannam@89
|
57 these concepts, or have serious file name length restrictions, such as
|
cannam@89
|
58 MS-DOS.
|
cannam@89
|
59
|
cannam@89
|
60 .I bzip2
|
cannam@89
|
61 and
|
cannam@89
|
62 .I bunzip2
|
cannam@89
|
63 will by default not overwrite existing
|
cannam@89
|
64 files. If you want this to happen, specify the \-f flag.
|
cannam@89
|
65
|
cannam@89
|
66 If no file names are specified,
|
cannam@89
|
67 .I bzip2
|
cannam@89
|
68 compresses from standard
|
cannam@89
|
69 input to standard output. In this case,
|
cannam@89
|
70 .I bzip2
|
cannam@89
|
71 will decline to
|
cannam@89
|
72 write compressed output to a terminal, as this would be entirely
|
cannam@89
|
73 incomprehensible and therefore pointless.
|
cannam@89
|
74
|
cannam@89
|
75 .I bunzip2
|
cannam@89
|
76 (or
|
cannam@89
|
77 .I bzip2 \-d)
|
cannam@89
|
78 decompresses all
|
cannam@89
|
79 specified files. Files which were not created by
|
cannam@89
|
80 .I bzip2
|
cannam@89
|
81 will be detected and ignored, and a warning issued.
|
cannam@89
|
82 .I bzip2
|
cannam@89
|
83 attempts to guess the filename for the decompressed file
|
cannam@89
|
84 from that of the compressed file as follows:
|
cannam@89
|
85
|
cannam@89
|
86 filename.bz2 becomes filename
|
cannam@89
|
87 filename.bz becomes filename
|
cannam@89
|
88 filename.tbz2 becomes filename.tar
|
cannam@89
|
89 filename.tbz becomes filename.tar
|
cannam@89
|
90 anyothername becomes anyothername.out
|
cannam@89
|
91
|
cannam@89
|
92 If the file does not end in one of the recognised endings,
|
cannam@89
|
93 .I .bz2,
|
cannam@89
|
94 .I .bz,
|
cannam@89
|
95 .I .tbz2
|
cannam@89
|
96 or
|
cannam@89
|
97 .I .tbz,
|
cannam@89
|
98 .I bzip2
|
cannam@89
|
99 complains that it cannot
|
cannam@89
|
100 guess the name of the original file, and uses the original name
|
cannam@89
|
101 with
|
cannam@89
|
102 .I .out
|
cannam@89
|
103 appended.
|
cannam@89
|
104
|
cannam@89
|
105 As with compression, supplying no
|
cannam@89
|
106 filenames causes decompression from
|
cannam@89
|
107 standard input to standard output.
|
cannam@89
|
108
|
cannam@89
|
109 .I bunzip2
|
cannam@89
|
110 will correctly decompress a file which is the
|
cannam@89
|
111 concatenation of two or more compressed files. The result is the
|
cannam@89
|
112 concatenation of the corresponding uncompressed files. Integrity
|
cannam@89
|
113 testing (\-t)
|
cannam@89
|
114 of concatenated
|
cannam@89
|
115 compressed files is also supported.
|
cannam@89
|
116
|
cannam@89
|
117 You can also compress or decompress files to the standard output by
|
cannam@89
|
118 giving the \-c flag. Multiple files may be compressed and
|
cannam@89
|
119 decompressed like this. The resulting outputs are fed sequentially to
|
cannam@89
|
120 stdout. Compression of multiple files
|
cannam@89
|
121 in this manner generates a stream
|
cannam@89
|
122 containing multiple compressed file representations. Such a stream
|
cannam@89
|
123 can be decompressed correctly only by
|
cannam@89
|
124 .I bzip2
|
cannam@89
|
125 version 0.9.0 or
|
cannam@89
|
126 later. Earlier versions of
|
cannam@89
|
127 .I bzip2
|
cannam@89
|
128 will stop after decompressing
|
cannam@89
|
129 the first file in the stream.
|
cannam@89
|
130
|
cannam@89
|
131 .I bzcat
|
cannam@89
|
132 (or
|
cannam@89
|
133 .I bzip2 -dc)
|
cannam@89
|
134 decompresses all specified files to
|
cannam@89
|
135 the standard output.
|
cannam@89
|
136
|
cannam@89
|
137 .I bzip2
|
cannam@89
|
138 will read arguments from the environment variables
|
cannam@89
|
139 .I BZIP2
|
cannam@89
|
140 and
|
cannam@89
|
141 .I BZIP,
|
cannam@89
|
142 in that order, and will process them
|
cannam@89
|
143 before any arguments read from the command line. This gives a
|
cannam@89
|
144 convenient way to supply default arguments.
|
cannam@89
|
145
|
cannam@89
|
146 Compression is always performed, even if the compressed
|
cannam@89
|
147 file is slightly
|
cannam@89
|
148 larger than the original. Files of less than about one hundred bytes
|
cannam@89
|
149 tend to get larger, since the compression mechanism has a constant
|
cannam@89
|
150 overhead in the region of 50 bytes. Random data (including the output
|
cannam@89
|
151 of most file compressors) is coded at about 8.05 bits per byte, giving
|
cannam@89
|
152 an expansion of around 0.5%.
|
cannam@89
|
153
|
cannam@89
|
154 As a self-check for your protection,
|
cannam@89
|
155 .I
|
cannam@89
|
156 bzip2
|
cannam@89
|
157 uses 32-bit CRCs to
|
cannam@89
|
158 make sure that the decompressed version of a file is identical to the
|
cannam@89
|
159 original. This guards against corruption of the compressed data, and
|
cannam@89
|
160 against undetected bugs in
|
cannam@89
|
161 .I bzip2
|
cannam@89
|
162 (hopefully very unlikely). The
|
cannam@89
|
163 chances of data corruption going undetected is microscopic, about one
|
cannam@89
|
164 chance in four billion for each file processed. Be aware, though, that
|
cannam@89
|
165 the check occurs upon decompression, so it can only tell you that
|
cannam@89
|
166 something is wrong. It can't help you
|
cannam@89
|
167 recover the original uncompressed
|
cannam@89
|
168 data. You can use
|
cannam@89
|
169 .I bzip2recover
|
cannam@89
|
170 to try to recover data from
|
cannam@89
|
171 damaged files.
|
cannam@89
|
172
|
cannam@89
|
173 Return values: 0 for a normal exit, 1 for environmental problems (file
|
cannam@89
|
174 not found, invalid flags, I/O errors, &c), 2 to indicate a corrupt
|
cannam@89
|
175 compressed file, 3 for an internal consistency error (eg, bug) which
|
cannam@89
|
176 caused
|
cannam@89
|
177 .I bzip2
|
cannam@89
|
178 to panic.
|
cannam@89
|
179
|
cannam@89
|
180 .SH OPTIONS
|
cannam@89
|
181 .TP
|
cannam@89
|
182 .B \-c --stdout
|
cannam@89
|
183 Compress or decompress to standard output.
|
cannam@89
|
184 .TP
|
cannam@89
|
185 .B \-d --decompress
|
cannam@89
|
186 Force decompression.
|
cannam@89
|
187 .I bzip2,
|
cannam@89
|
188 .I bunzip2
|
cannam@89
|
189 and
|
cannam@89
|
190 .I bzcat
|
cannam@89
|
191 are
|
cannam@89
|
192 really the same program, and the decision about what actions to take is
|
cannam@89
|
193 done on the basis of which name is used. This flag overrides that
|
cannam@89
|
194 mechanism, and forces
|
cannam@89
|
195 .I bzip2
|
cannam@89
|
196 to decompress.
|
cannam@89
|
197 .TP
|
cannam@89
|
198 .B \-z --compress
|
cannam@89
|
199 The complement to \-d: forces compression, regardless of the
|
cannam@89
|
200 invocation name.
|
cannam@89
|
201 .TP
|
cannam@89
|
202 .B \-t --test
|
cannam@89
|
203 Check integrity of the specified file(s), but don't decompress them.
|
cannam@89
|
204 This really performs a trial decompression and throws away the result.
|
cannam@89
|
205 .TP
|
cannam@89
|
206 .B \-f --force
|
cannam@89
|
207 Force overwrite of output files. Normally,
|
cannam@89
|
208 .I bzip2
|
cannam@89
|
209 will not overwrite
|
cannam@89
|
210 existing output files. Also forces
|
cannam@89
|
211 .I bzip2
|
cannam@89
|
212 to break hard links
|
cannam@89
|
213 to files, which it otherwise wouldn't do.
|
cannam@89
|
214
|
cannam@89
|
215 bzip2 normally declines to decompress files which don't have the
|
cannam@89
|
216 correct magic header bytes. If forced (-f), however, it will pass
|
cannam@89
|
217 such files through unmodified. This is how GNU gzip behaves.
|
cannam@89
|
218 .TP
|
cannam@89
|
219 .B \-k --keep
|
cannam@89
|
220 Keep (don't delete) input files during compression
|
cannam@89
|
221 or decompression.
|
cannam@89
|
222 .TP
|
cannam@89
|
223 .B \-s --small
|
cannam@89
|
224 Reduce memory usage, for compression, decompression and testing. Files
|
cannam@89
|
225 are decompressed and tested using a modified algorithm which only
|
cannam@89
|
226 requires 2.5 bytes per block byte. This means any file can be
|
cannam@89
|
227 decompressed in 2300k of memory, albeit at about half the normal speed.
|
cannam@89
|
228
|
cannam@89
|
229 During compression, \-s selects a block size of 200k, which limits
|
cannam@89
|
230 memory use to around the same figure, at the expense of your compression
|
cannam@89
|
231 ratio. In short, if your machine is low on memory (8 megabytes or
|
cannam@89
|
232 less), use \-s for everything. See MEMORY MANAGEMENT below.
|
cannam@89
|
233 .TP
|
cannam@89
|
234 .B \-q --quiet
|
cannam@89
|
235 Suppress non-essential warning messages. Messages pertaining to
|
cannam@89
|
236 I/O errors and other critical events will not be suppressed.
|
cannam@89
|
237 .TP
|
cannam@89
|
238 .B \-v --verbose
|
cannam@89
|
239 Verbose mode -- show the compression ratio for each file processed.
|
cannam@89
|
240 Further \-v's increase the verbosity level, spewing out lots of
|
cannam@89
|
241 information which is primarily of interest for diagnostic purposes.
|
cannam@89
|
242 .TP
|
cannam@89
|
243 .B \-L --license -V --version
|
cannam@89
|
244 Display the software version, license terms and conditions.
|
cannam@89
|
245 .TP
|
cannam@89
|
246 .B \-1 (or \-\-fast) to \-9 (or \-\-best)
|
cannam@89
|
247 Set the block size to 100 k, 200 k .. 900 k when compressing. Has no
|
cannam@89
|
248 effect when decompressing. See MEMORY MANAGEMENT below.
|
cannam@89
|
249 The \-\-fast and \-\-best aliases are primarily for GNU gzip
|
cannam@89
|
250 compatibility. In particular, \-\-fast doesn't make things
|
cannam@89
|
251 significantly faster.
|
cannam@89
|
252 And \-\-best merely selects the default behaviour.
|
cannam@89
|
253 .TP
|
cannam@89
|
254 .B \--
|
cannam@89
|
255 Treats all subsequent arguments as file names, even if they start
|
cannam@89
|
256 with a dash. This is so you can handle files with names beginning
|
cannam@89
|
257 with a dash, for example: bzip2 \-- \-myfilename.
|
cannam@89
|
258 .TP
|
cannam@89
|
259 .B \--repetitive-fast --repetitive-best
|
cannam@89
|
260 These flags are redundant in versions 0.9.5 and above. They provided
|
cannam@89
|
261 some coarse control over the behaviour of the sorting algorithm in
|
cannam@89
|
262 earlier versions, which was sometimes useful. 0.9.5 and above have an
|
cannam@89
|
263 improved algorithm which renders these flags irrelevant.
|
cannam@89
|
264
|
cannam@89
|
265 .SH MEMORY MANAGEMENT
|
cannam@89
|
266 .I bzip2
|
cannam@89
|
267 compresses large files in blocks. The block size affects
|
cannam@89
|
268 both the compression ratio achieved, and the amount of memory needed for
|
cannam@89
|
269 compression and decompression. The flags \-1 through \-9
|
cannam@89
|
270 specify the block size to be 100,000 bytes through 900,000 bytes (the
|
cannam@89
|
271 default) respectively. At decompression time, the block size used for
|
cannam@89
|
272 compression is read from the header of the compressed file, and
|
cannam@89
|
273 .I bunzip2
|
cannam@89
|
274 then allocates itself just enough memory to decompress
|
cannam@89
|
275 the file. Since block sizes are stored in compressed files, it follows
|
cannam@89
|
276 that the flags \-1 to \-9 are irrelevant to and so ignored
|
cannam@89
|
277 during decompression.
|
cannam@89
|
278
|
cannam@89
|
279 Compression and decompression requirements,
|
cannam@89
|
280 in bytes, can be estimated as:
|
cannam@89
|
281
|
cannam@89
|
282 Compression: 400k + ( 8 x block size )
|
cannam@89
|
283
|
cannam@89
|
284 Decompression: 100k + ( 4 x block size ), or
|
cannam@89
|
285 100k + ( 2.5 x block size )
|
cannam@89
|
286
|
cannam@89
|
287 Larger block sizes give rapidly diminishing marginal returns. Most of
|
cannam@89
|
288 the compression comes from the first two or three hundred k of block
|
cannam@89
|
289 size, a fact worth bearing in mind when using
|
cannam@89
|
290 .I bzip2
|
cannam@89
|
291 on small machines.
|
cannam@89
|
292 It is also important to appreciate that the decompression memory
|
cannam@89
|
293 requirement is set at compression time by the choice of block size.
|
cannam@89
|
294
|
cannam@89
|
295 For files compressed with the default 900k block size,
|
cannam@89
|
296 .I bunzip2
|
cannam@89
|
297 will require about 3700 kbytes to decompress. To support decompression
|
cannam@89
|
298 of any file on a 4 megabyte machine,
|
cannam@89
|
299 .I bunzip2
|
cannam@89
|
300 has an option to
|
cannam@89
|
301 decompress using approximately half this amount of memory, about 2300
|
cannam@89
|
302 kbytes. Decompression speed is also halved, so you should use this
|
cannam@89
|
303 option only where necessary. The relevant flag is -s.
|
cannam@89
|
304
|
cannam@89
|
305 In general, try and use the largest block size memory constraints allow,
|
cannam@89
|
306 since that maximises the compression achieved. Compression and
|
cannam@89
|
307 decompression speed are virtually unaffected by block size.
|
cannam@89
|
308
|
cannam@89
|
309 Another significant point applies to files which fit in a single block
|
cannam@89
|
310 -- that means most files you'd encounter using a large block size. The
|
cannam@89
|
311 amount of real memory touched is proportional to the size of the file,
|
cannam@89
|
312 since the file is smaller than a block. For example, compressing a file
|
cannam@89
|
313 20,000 bytes long with the flag -9 will cause the compressor to
|
cannam@89
|
314 allocate around 7600k of memory, but only touch 400k + 20000 * 8 = 560
|
cannam@89
|
315 kbytes of it. Similarly, the decompressor will allocate 3700k but only
|
cannam@89
|
316 touch 100k + 20000 * 4 = 180 kbytes.
|
cannam@89
|
317
|
cannam@89
|
318 Here is a table which summarises the maximum memory usage for different
|
cannam@89
|
319 block sizes. Also recorded is the total compressed size for 14 files of
|
cannam@89
|
320 the Calgary Text Compression Corpus totalling 3,141,622 bytes. This
|
cannam@89
|
321 column gives some feel for how compression varies with block size.
|
cannam@89
|
322 These figures tend to understate the advantage of larger block sizes for
|
cannam@89
|
323 larger files, since the Corpus is dominated by smaller files.
|
cannam@89
|
324
|
cannam@89
|
325 Compress Decompress Decompress Corpus
|
cannam@89
|
326 Flag usage usage -s usage Size
|
cannam@89
|
327
|
cannam@89
|
328 -1 1200k 500k 350k 914704
|
cannam@89
|
329 -2 2000k 900k 600k 877703
|
cannam@89
|
330 -3 2800k 1300k 850k 860338
|
cannam@89
|
331 -4 3600k 1700k 1100k 846899
|
cannam@89
|
332 -5 4400k 2100k 1350k 845160
|
cannam@89
|
333 -6 5200k 2500k 1600k 838626
|
cannam@89
|
334 -7 6100k 2900k 1850k 834096
|
cannam@89
|
335 -8 6800k 3300k 2100k 828642
|
cannam@89
|
336 -9 7600k 3700k 2350k 828642
|
cannam@89
|
337
|
cannam@89
|
338 .SH RECOVERING DATA FROM DAMAGED FILES
|
cannam@89
|
339 .I bzip2
|
cannam@89
|
340 compresses files in blocks, usually 900kbytes long. Each
|
cannam@89
|
341 block is handled independently. If a media or transmission error causes
|
cannam@89
|
342 a multi-block .bz2
|
cannam@89
|
343 file to become damaged, it may be possible to
|
cannam@89
|
344 recover data from the undamaged blocks in the file.
|
cannam@89
|
345
|
cannam@89
|
346 The compressed representation of each block is delimited by a 48-bit
|
cannam@89
|
347 pattern, which makes it possible to find the block boundaries with
|
cannam@89
|
348 reasonable certainty. Each block also carries its own 32-bit CRC, so
|
cannam@89
|
349 damaged blocks can be distinguished from undamaged ones.
|
cannam@89
|
350
|
cannam@89
|
351 .I bzip2recover
|
cannam@89
|
352 is a simple program whose purpose is to search for
|
cannam@89
|
353 blocks in .bz2 files, and write each block out into its own .bz2
|
cannam@89
|
354 file. You can then use
|
cannam@89
|
355 .I bzip2
|
cannam@89
|
356 \-t
|
cannam@89
|
357 to test the
|
cannam@89
|
358 integrity of the resulting files, and decompress those which are
|
cannam@89
|
359 undamaged.
|
cannam@89
|
360
|
cannam@89
|
361 .I bzip2recover
|
cannam@89
|
362 takes a single argument, the name of the damaged file,
|
cannam@89
|
363 and writes a number of files "rec00001file.bz2",
|
cannam@89
|
364 "rec00002file.bz2", etc, containing the extracted blocks.
|
cannam@89
|
365 The output filenames are designed so that the use of
|
cannam@89
|
366 wildcards in subsequent processing -- for example,
|
cannam@89
|
367 "bzip2 -dc rec*file.bz2 > recovered_data" -- processes the files in
|
cannam@89
|
368 the correct order.
|
cannam@89
|
369
|
cannam@89
|
370 .I bzip2recover
|
cannam@89
|
371 should be of most use dealing with large .bz2
|
cannam@89
|
372 files, as these will contain many blocks. It is clearly
|
cannam@89
|
373 futile to use it on damaged single-block files, since a
|
cannam@89
|
374 damaged block cannot be recovered. If you wish to minimise
|
cannam@89
|
375 any potential data loss through media or transmission errors,
|
cannam@89
|
376 you might consider compressing with a smaller
|
cannam@89
|
377 block size.
|
cannam@89
|
378
|
cannam@89
|
379 .SH PERFORMANCE NOTES
|
cannam@89
|
380 The sorting phase of compression gathers together similar strings in the
|
cannam@89
|
381 file. Because of this, files containing very long runs of repeated
|
cannam@89
|
382 symbols, like "aabaabaabaab ..." (repeated several hundred times) may
|
cannam@89
|
383 compress more slowly than normal. Versions 0.9.5 and above fare much
|
cannam@89
|
384 better than previous versions in this respect. The ratio between
|
cannam@89
|
385 worst-case and average-case compression time is in the region of 10:1.
|
cannam@89
|
386 For previous versions, this figure was more like 100:1. You can use the
|
cannam@89
|
387 \-vvvv option to monitor progress in great detail, if you want.
|
cannam@89
|
388
|
cannam@89
|
389 Decompression speed is unaffected by these phenomena.
|
cannam@89
|
390
|
cannam@89
|
391 .I bzip2
|
cannam@89
|
392 usually allocates several megabytes of memory to operate
|
cannam@89
|
393 in, and then charges all over it in a fairly random fashion. This means
|
cannam@89
|
394 that performance, both for compressing and decompressing, is largely
|
cannam@89
|
395 determined by the speed at which your machine can service cache misses.
|
cannam@89
|
396 Because of this, small changes to the code to reduce the miss rate have
|
cannam@89
|
397 been observed to give disproportionately large performance improvements.
|
cannam@89
|
398 I imagine
|
cannam@89
|
399 .I bzip2
|
cannam@89
|
400 will perform best on machines with very large caches.
|
cannam@89
|
401
|
cannam@89
|
402 .SH CAVEATS
|
cannam@89
|
403 I/O error messages are not as helpful as they could be.
|
cannam@89
|
404 .I bzip2
|
cannam@89
|
405 tries hard to detect I/O errors and exit cleanly, but the details of
|
cannam@89
|
406 what the problem is sometimes seem rather misleading.
|
cannam@89
|
407
|
cannam@89
|
408 This manual page pertains to version 1.0.6 of
|
cannam@89
|
409 .I bzip2.
|
cannam@89
|
410 Compressed data created by this version is entirely forwards and
|
cannam@89
|
411 backwards compatible with the previous public releases, versions
|
cannam@89
|
412 0.1pl2, 0.9.0, 0.9.5, 1.0.0, 1.0.1, 1.0.2 and above, but with the following
|
cannam@89
|
413 exception: 0.9.0 and above can correctly decompress multiple
|
cannam@89
|
414 concatenated compressed files. 0.1pl2 cannot do this; it will stop
|
cannam@89
|
415 after decompressing just the first file in the stream.
|
cannam@89
|
416
|
cannam@89
|
417 .I bzip2recover
|
cannam@89
|
418 versions prior to 1.0.2 used 32-bit integers to represent
|
cannam@89
|
419 bit positions in compressed files, so they could not handle compressed
|
cannam@89
|
420 files more than 512 megabytes long. Versions 1.0.2 and above use
|
cannam@89
|
421 64-bit ints on some platforms which support them (GNU supported
|
cannam@89
|
422 targets, and Windows). To establish whether or not bzip2recover was
|
cannam@89
|
423 built with such a limitation, run it without arguments. In any event
|
cannam@89
|
424 you can build yourself an unlimited version if you can recompile it
|
cannam@89
|
425 with MaybeUInt64 set to be an unsigned 64-bit integer.
|
cannam@89
|
426
|
cannam@89
|
427
|
cannam@89
|
428
|
cannam@89
|
429 .SH AUTHOR
|
cannam@89
|
430 Julian Seward, jsewardbzip.org.
|
cannam@89
|
431
|
cannam@89
|
432 http://www.bzip.org
|
cannam@89
|
433
|
cannam@89
|
434 The ideas embodied in
|
cannam@89
|
435 .I bzip2
|
cannam@89
|
436 are due to (at least) the following
|
cannam@89
|
437 people: Michael Burrows and David Wheeler (for the block sorting
|
cannam@89
|
438 transformation), David Wheeler (again, for the Huffman coder), Peter
|
cannam@89
|
439 Fenwick (for the structured coding model in the original
|
cannam@89
|
440 .I bzip,
|
cannam@89
|
441 and many refinements), and Alistair Moffat, Radford Neal and Ian Witten
|
cannam@89
|
442 (for the arithmetic coder in the original
|
cannam@89
|
443 .I bzip).
|
cannam@89
|
444 I am much
|
cannam@89
|
445 indebted for their help, support and advice. See the manual in the
|
cannam@89
|
446 source distribution for pointers to sources of documentation. Christian
|
cannam@89
|
447 von Roques encouraged me to look for faster sorting algorithms, so as to
|
cannam@89
|
448 speed up compression. Bela Lubkin encouraged me to improve the
|
cannam@89
|
449 worst-case compression performance.
|
cannam@89
|
450 Donna Robinson XMLised the documentation.
|
cannam@89
|
451 The bz* scripts are derived from those of GNU gzip.
|
cannam@89
|
452 Many people sent patches, helped
|
cannam@89
|
453 with portability problems, lent machines, gave advice and were generally
|
cannam@89
|
454 helpful.
|