sv-dependency-builds: src/bzip2-1.0.6/bzip2.1 annotate

annotate src/bzip2-1.0.6/bzip2.1 @ 169:223a55898ab9 tip default

Add null config files

author	Chris Cannam <cannam@all-day-breakfast.com>
date	Mon, 02 Mar 2020 14:03:47 +0000
parents	8a15ff55d9af
children

rev	line source
cannam@89	1 .PU
cannam@89	2 .TH bzip2 1
cannam@89	3 .SH NAME
cannam@89	4 bzip2, bunzip2 \- a block-sorting file compressor, v1.0.6
cannam@89	5 .br
cannam@89	6 bzcat \- decompresses files to stdout
cannam@89	7 .br
cannam@89	8 bzip2recover \- recovers data from damaged bzip2 files
cannam@89	9
cannam@89	10 .SH SYNOPSIS
cannam@89	11 .ll +8
cannam@89	12 .B bzip2
cannam@89	13 .RB [ " \-cdfkqstvzVL123456789 " ]
cannam@89	14 [
cannam@89	15 .I "filenames \&..."
cannam@89	16 ]
cannam@89	17 .ll -8
cannam@89	18 .br
cannam@89	19 .B bunzip2
cannam@89	20 .RB [ " \-fkvsVL " ]
cannam@89	21 [
cannam@89	22 .I "filenames \&..."
cannam@89	23 ]
cannam@89	24 .br
cannam@89	25 .B bzcat
cannam@89	26 .RB [ " \-s " ]
cannam@89	27 [
cannam@89	28 .I "filenames \&..."
cannam@89	29 ]
cannam@89	30 .br
cannam@89	31 .B bzip2recover
cannam@89	32 .I "filename"
cannam@89	33
cannam@89	34 .SH DESCRIPTION
cannam@89	35 .I bzip2
cannam@89	36 compresses files using the Burrows-Wheeler block sorting
cannam@89	37 text compression algorithm, and Huffman coding. Compression is
cannam@89	38 generally considerably better than that achieved by more conventional
cannam@89	39 LZ77/LZ78-based compressors, and approaches the performance of the PPM
cannam@89	40 family of statistical compressors.
cannam@89	41
cannam@89	42 The command-line options are deliberately very similar to
cannam@89	43 those of
cannam@89	44 .I GNU gzip,
cannam@89	45 but they are not identical.
cannam@89	46
cannam@89	47 .I bzip2
cannam@89	48 expects a list of file names to accompany the
cannam@89	49 command-line flags. Each file is replaced by a compressed version of
cannam@89	50 itself, with the name "original_name.bz2".
cannam@89	51 Each compressed file
cannam@89	52 has the same modification date, permissions, and, when possible,
cannam@89	53 ownership as the corresponding original, so that these properties can
cannam@89	54 be correctly restored at decompression time. File name handling is
cannam@89	55 naive in the sense that there is no mechanism for preserving original
cannam@89	56 file names, permissions, ownerships or dates in filesystems which lack
cannam@89	57 these concepts, or have serious file name length restrictions, such as
cannam@89	58 MS-DOS.
cannam@89	59
cannam@89	60 .I bzip2
cannam@89	61 and
cannam@89	62 .I bunzip2
cannam@89	63 will by default not overwrite existing
cannam@89	64 files. If you want this to happen, specify the \-f flag.
cannam@89	65
cannam@89	66 If no file names are specified,
cannam@89	67 .I bzip2
cannam@89	68 compresses from standard
cannam@89	69 input to standard output. In this case,
cannam@89	70 .I bzip2
cannam@89	71 will decline to
cannam@89	72 write compressed output to a terminal, as this would be entirely
cannam@89	73 incomprehensible and therefore pointless.
cannam@89	74
cannam@89	75 .I bunzip2
cannam@89	76 (or
cannam@89	77 .I bzip2 \-d)
cannam@89	78 decompresses all
cannam@89	79 specified files. Files which were not created by
cannam@89	80 .I bzip2
cannam@89	81 will be detected and ignored, and a warning issued.
cannam@89	82 .I bzip2
cannam@89	83 attempts to guess the filename for the decompressed file
cannam@89	84 from that of the compressed file as follows:
cannam@89	85
cannam@89	86 filename.bz2 becomes filename
cannam@89	87 filename.bz becomes filename
cannam@89	88 filename.tbz2 becomes filename.tar
cannam@89	89 filename.tbz becomes filename.tar
cannam@89	90 anyothername becomes anyothername.out
cannam@89	91
cannam@89	92 If the file does not end in one of the recognised endings,
cannam@89	93 .I .bz2,
cannam@89	94 .I .bz,
cannam@89	95 .I .tbz2
cannam@89	96 or
cannam@89	97 .I .tbz,
cannam@89	98 .I bzip2
cannam@89	99 complains that it cannot
cannam@89	100 guess the name of the original file, and uses the original name
cannam@89	101 with
cannam@89	102 .I .out
cannam@89	103 appended.
cannam@89	104
cannam@89	105 As with compression, supplying no
cannam@89	106 filenames causes decompression from
cannam@89	107 standard input to standard output.
cannam@89	108
cannam@89	109 .I bunzip2
cannam@89	110 will correctly decompress a file which is the
cannam@89	111 concatenation of two or more compressed files. The result is the
cannam@89	112 concatenation of the corresponding uncompressed files. Integrity
cannam@89	113 testing (\-t)
cannam@89	114 of concatenated
cannam@89	115 compressed files is also supported.
cannam@89	116
cannam@89	117 You can also compress or decompress files to the standard output by
cannam@89	118 giving the \-c flag. Multiple files may be compressed and
cannam@89	119 decompressed like this. The resulting outputs are fed sequentially to
cannam@89	120 stdout. Compression of multiple files
cannam@89	121 in this manner generates a stream
cannam@89	122 containing multiple compressed file representations. Such a stream
cannam@89	123 can be decompressed correctly only by
cannam@89	124 .I bzip2
cannam@89	125 version 0.9.0 or
cannam@89	126 later. Earlier versions of
cannam@89	127 .I bzip2
cannam@89	128 will stop after decompressing
cannam@89	129 the first file in the stream.
cannam@89	130
cannam@89	131 .I bzcat
cannam@89	132 (or
cannam@89	133 .I bzip2 -dc)
cannam@89	134 decompresses all specified files to
cannam@89	135 the standard output.
cannam@89	136
cannam@89	137 .I bzip2
cannam@89	138 will read arguments from the environment variables
cannam@89	139 .I BZIP2
cannam@89	140 and
cannam@89	141 .I BZIP,
cannam@89	142 in that order, and will process them
cannam@89	143 before any arguments read from the command line. This gives a
cannam@89	144 convenient way to supply default arguments.
cannam@89	145
cannam@89	146 Compression is always performed, even if the compressed
cannam@89	147 file is slightly
cannam@89	148 larger than the original. Files of less than about one hundred bytes
cannam@89	149 tend to get larger, since the compression mechanism has a constant
cannam@89	150 overhead in the region of 50 bytes. Random data (including the output
cannam@89	151 of most file compressors) is coded at about 8.05 bits per byte, giving
cannam@89	152 an expansion of around 0.5%.
cannam@89	153
cannam@89	154 As a self-check for your protection,
cannam@89	155 .I
cannam@89	156 bzip2
cannam@89	157 uses 32-bit CRCs to
cannam@89	158 make sure that the decompressed version of a file is identical to the
cannam@89	159 original. This guards against corruption of the compressed data, and
cannam@89	160 against undetected bugs in
cannam@89	161 .I bzip2
cannam@89	162 (hopefully very unlikely). The
cannam@89	163 chances of data corruption going undetected is microscopic, about one
cannam@89	164 chance in four billion for each file processed. Be aware, though, that
cannam@89	165 the check occurs upon decompression, so it can only tell you that
cannam@89	166 something is wrong. It can't help you
cannam@89	167 recover the original uncompressed
cannam@89	168 data. You can use
cannam@89	169 .I bzip2recover
cannam@89	170 to try to recover data from
cannam@89	171 damaged files.
cannam@89	172
cannam@89	173 Return values: 0 for a normal exit, 1 for environmental problems (file
cannam@89	174 not found, invalid flags, I/O errors, &c), 2 to indicate a corrupt
cannam@89	175 compressed file, 3 for an internal consistency error (eg, bug) which
cannam@89	176 caused
cannam@89	177 .I bzip2
cannam@89	178 to panic.
cannam@89	179
cannam@89	180 .SH OPTIONS
cannam@89	181 .TP
cannam@89	182 .B \-c --stdout
cannam@89	183 Compress or decompress to standard output.
cannam@89	184 .TP
cannam@89	185 .B \-d --decompress
cannam@89	186 Force decompression.
cannam@89	187 .I bzip2,
cannam@89	188 .I bunzip2
cannam@89	189 and
cannam@89	190 .I bzcat
cannam@89	191 are
cannam@89	192 really the same program, and the decision about what actions to take is
cannam@89	193 done on the basis of which name is used. This flag overrides that
cannam@89	194 mechanism, and forces
cannam@89	195 .I bzip2
cannam@89	196 to decompress.
cannam@89	197 .TP
cannam@89	198 .B \-z --compress
cannam@89	199 The complement to \-d: forces compression, regardless of the
cannam@89	200 invocation name.
cannam@89	201 .TP
cannam@89	202 .B \-t --test
cannam@89	203 Check integrity of the specified file(s), but don't decompress them.
cannam@89	204 This really performs a trial decompression and throws away the result.
cannam@89	205 .TP
cannam@89	206 .B \-f --force
cannam@89	207 Force overwrite of output files. Normally,
cannam@89	208 .I bzip2
cannam@89	209 will not overwrite
cannam@89	210 existing output files. Also forces
cannam@89	211 .I bzip2
cannam@89	212 to break hard links
cannam@89	213 to files, which it otherwise wouldn't do.
cannam@89	214
cannam@89	215 bzip2 normally declines to decompress files which don't have the
cannam@89	216 correct magic header bytes. If forced (-f), however, it will pass
cannam@89	217 such files through unmodified. This is how GNU gzip behaves.
cannam@89	218 .TP
cannam@89	219 .B \-k --keep
cannam@89	220 Keep (don't delete) input files during compression
cannam@89	221 or decompression.
cannam@89	222 .TP
cannam@89	223 .B \-s --small
cannam@89	224 Reduce memory usage, for compression, decompression and testing. Files
cannam@89	225 are decompressed and tested using a modified algorithm which only
cannam@89	226 requires 2.5 bytes per block byte. This means any file can be
cannam@89	227 decompressed in 2300k of memory, albeit at about half the normal speed.
cannam@89	228
cannam@89	229 During compression, \-s selects a block size of 200k, which limits
cannam@89	230 memory use to around the same figure, at the expense of your compression
cannam@89	231 ratio. In short, if your machine is low on memory (8 megabytes or
cannam@89	232 less), use \-s for everything. See MEMORY MANAGEMENT below.
cannam@89	233 .TP
cannam@89	234 .B \-q --quiet
cannam@89	235 Suppress non-essential warning messages. Messages pertaining to
cannam@89	236 I/O errors and other critical events will not be suppressed.
cannam@89	237 .TP
cannam@89	238 .B \-v --verbose
cannam@89	239 Verbose mode -- show the compression ratio for each file processed.
cannam@89	240 Further \-v's increase the verbosity level, spewing out lots of
cannam@89	241 information which is primarily of interest for diagnostic purposes.
cannam@89	242 .TP
cannam@89	243 .B \-L --license -V --version
cannam@89	244 Display the software version, license terms and conditions.
cannam@89	245 .TP
cannam@89	246 .B \-1 (or \-\-fast) to \-9 (or \-\-best)
cannam@89	247 Set the block size to 100 k, 200 k .. 900 k when compressing. Has no
cannam@89	248 effect when decompressing. See MEMORY MANAGEMENT below.
cannam@89	249 The \-\-fast and \-\-best aliases are primarily for GNU gzip
cannam@89	250 compatibility. In particular, \-\-fast doesn't make things
cannam@89	251 significantly faster.
cannam@89	252 And \-\-best merely selects the default behaviour.
cannam@89	253 .TP
cannam@89	254 .B \--
cannam@89	255 Treats all subsequent arguments as file names, even if they start
cannam@89	256 with a dash. This is so you can handle files with names beginning
cannam@89	257 with a dash, for example: bzip2 \-- \-myfilename.
cannam@89	258 .TP
cannam@89	259 .B \--repetitive-fast --repetitive-best
cannam@89	260 These flags are redundant in versions 0.9.5 and above. They provided
cannam@89	261 some coarse control over the behaviour of the sorting algorithm in
cannam@89	262 earlier versions, which was sometimes useful. 0.9.5 and above have an
cannam@89	263 improved algorithm which renders these flags irrelevant.
cannam@89	264
cannam@89	265 .SH MEMORY MANAGEMENT
cannam@89	266 .I bzip2
cannam@89	267 compresses large files in blocks. The block size affects
cannam@89	268 both the compression ratio achieved, and the amount of memory needed for
cannam@89	269 compression and decompression. The flags \-1 through \-9
cannam@89	270 specify the block size to be 100,000 bytes through 900,000 bytes (the
cannam@89	271 default) respectively. At decompression time, the block size used for
cannam@89	272 compression is read from the header of the compressed file, and
cannam@89	273 .I bunzip2
cannam@89	274 then allocates itself just enough memory to decompress
cannam@89	275 the file. Since block sizes are stored in compressed files, it follows
cannam@89	276 that the flags \-1 to \-9 are irrelevant to and so ignored
cannam@89	277 during decompression.
cannam@89	278
cannam@89	279 Compression and decompression requirements,
cannam@89	280 in bytes, can be estimated as:
cannam@89	281
cannam@89	282 Compression: 400k + ( 8 x block size )
cannam@89	283
cannam@89	284 Decompression: 100k + ( 4 x block size ), or
cannam@89	285 100k + ( 2.5 x block size )
cannam@89	286
cannam@89	287 Larger block sizes give rapidly diminishing marginal returns. Most of
cannam@89	288 the compression comes from the first two or three hundred k of block
cannam@89	289 size, a fact worth bearing in mind when using
cannam@89	290 .I bzip2
cannam@89	291 on small machines.
cannam@89	292 It is also important to appreciate that the decompression memory
cannam@89	293 requirement is set at compression time by the choice of block size.
cannam@89	294
cannam@89	295 For files compressed with the default 900k block size,
cannam@89	296 .I bunzip2
cannam@89	297 will require about 3700 kbytes to decompress. To support decompression
cannam@89	298 of any file on a 4 megabyte machine,
cannam@89	299 .I bunzip2
cannam@89	300 has an option to
cannam@89	301 decompress using approximately half this amount of memory, about 2300
cannam@89	302 kbytes. Decompression speed is also halved, so you should use this
cannam@89	303 option only where necessary. The relevant flag is -s.
cannam@89	304
cannam@89	305 In general, try and use the largest block size memory constraints allow,
cannam@89	306 since that maximises the compression achieved. Compression and
cannam@89	307 decompression speed are virtually unaffected by block size.
cannam@89	308
cannam@89	309 Another significant point applies to files which fit in a single block
cannam@89	310 -- that means most files you'd encounter using a large block size. The
cannam@89	311 amount of real memory touched is proportional to the size of the file,
cannam@89	312 since the file is smaller than a block. For example, compressing a file
cannam@89	313 20,000 bytes long with the flag -9 will cause the compressor to
cannam@89	314 allocate around 7600k of memory, but only touch 400k + 20000 * 8 = 560
cannam@89	315 kbytes of it. Similarly, the decompressor will allocate 3700k but only
cannam@89	316 touch 100k + 20000 * 4 = 180 kbytes.
cannam@89	317
cannam@89	318 Here is a table which summarises the maximum memory usage for different
cannam@89	319 block sizes. Also recorded is the total compressed size for 14 files of
cannam@89	320 the Calgary Text Compression Corpus totalling 3,141,622 bytes. This
cannam@89	321 column gives some feel for how compression varies with block size.
cannam@89	322 These figures tend to understate the advantage of larger block sizes for
cannam@89	323 larger files, since the Corpus is dominated by smaller files.
cannam@89	324
cannam@89	325 Compress Decompress Decompress Corpus
cannam@89	326 Flag usage usage -s usage Size
cannam@89	327
cannam@89	328 -1 1200k 500k 350k 914704
cannam@89	329 -2 2000k 900k 600k 877703
cannam@89	330 -3 2800k 1300k 850k 860338
cannam@89	331 -4 3600k 1700k 1100k 846899
cannam@89	332 -5 4400k 2100k 1350k 845160
cannam@89	333 -6 5200k 2500k 1600k 838626
cannam@89	334 -7 6100k 2900k 1850k 834096
cannam@89	335 -8 6800k 3300k 2100k 828642
cannam@89	336 -9 7600k 3700k 2350k 828642
cannam@89	337
cannam@89	338 .SH RECOVERING DATA FROM DAMAGED FILES
cannam@89	339 .I bzip2
cannam@89	340 compresses files in blocks, usually 900kbytes long. Each
cannam@89	341 block is handled independently. If a media or transmission error causes
cannam@89	342 a multi-block .bz2
cannam@89	343 file to become damaged, it may be possible to
cannam@89	344 recover data from the undamaged blocks in the file.
cannam@89	345
cannam@89	346 The compressed representation of each block is delimited by a 48-bit
cannam@89	347 pattern, which makes it possible to find the block boundaries with
cannam@89	348 reasonable certainty. Each block also carries its own 32-bit CRC, so
cannam@89	349 damaged blocks can be distinguished from undamaged ones.
cannam@89	350
cannam@89	351 .I bzip2recover
cannam@89	352 is a simple program whose purpose is to search for
cannam@89	353 blocks in .bz2 files, and write each block out into its own .bz2
cannam@89	354 file. You can then use
cannam@89	355 .I bzip2
cannam@89	356 \-t
cannam@89	357 to test the
cannam@89	358 integrity of the resulting files, and decompress those which are
cannam@89	359 undamaged.
cannam@89	360
cannam@89	361 .I bzip2recover
cannam@89	362 takes a single argument, the name of the damaged file,
cannam@89	363 and writes a number of files "rec00001file.bz2",
cannam@89	364 "rec00002file.bz2", etc, containing the extracted blocks.
cannam@89	365 The output filenames are designed so that the use of
cannam@89	366 wildcards in subsequent processing -- for example,
cannam@89	367 "bzip2 -dc rec*file.bz2 > recovered_data" -- processes the files in
cannam@89	368 the correct order.
cannam@89	369
cannam@89	370 .I bzip2recover
cannam@89	371 should be of most use dealing with large .bz2
cannam@89	372 files, as these will contain many blocks. It is clearly
cannam@89	373 futile to use it on damaged single-block files, since a
cannam@89	374 damaged block cannot be recovered. If you wish to minimise
cannam@89	375 any potential data loss through media or transmission errors,
cannam@89	376 you might consider compressing with a smaller
cannam@89	377 block size.
cannam@89	378
cannam@89	379 .SH PERFORMANCE NOTES
cannam@89	380 The sorting phase of compression gathers together similar strings in the
cannam@89	381 file. Because of this, files containing very long runs of repeated
cannam@89	382 symbols, like "aabaabaabaab ..." (repeated several hundred times) may
cannam@89	383 compress more slowly than normal. Versions 0.9.5 and above fare much
cannam@89	384 better than previous versions in this respect. The ratio between
cannam@89	385 worst-case and average-case compression time is in the region of 10:1.
cannam@89	386 For previous versions, this figure was more like 100:1. You can use the
cannam@89	387 \-vvvv option to monitor progress in great detail, if you want.
cannam@89	388
cannam@89	389 Decompression speed is unaffected by these phenomena.
cannam@89	390
cannam@89	391 .I bzip2
cannam@89	392 usually allocates several megabytes of memory to operate
cannam@89	393 in, and then charges all over it in a fairly random fashion. This means
cannam@89	394 that performance, both for compressing and decompressing, is largely
cannam@89	395 determined by the speed at which your machine can service cache misses.
cannam@89	396 Because of this, small changes to the code to reduce the miss rate have
cannam@89	397 been observed to give disproportionately large performance improvements.
cannam@89	398 I imagine
cannam@89	399 .I bzip2
cannam@89	400 will perform best on machines with very large caches.
cannam@89	401
cannam@89	402 .SH CAVEATS
cannam@89	403 I/O error messages are not as helpful as they could be.
cannam@89	404 .I bzip2
cannam@89	405 tries hard to detect I/O errors and exit cleanly, but the details of
cannam@89	406 what the problem is sometimes seem rather misleading.
cannam@89	407
cannam@89	408 This manual page pertains to version 1.0.6 of
cannam@89	409 .I bzip2.
cannam@89	410 Compressed data created by this version is entirely forwards and
cannam@89	411 backwards compatible with the previous public releases, versions
cannam@89	412 0.1pl2, 0.9.0, 0.9.5, 1.0.0, 1.0.1, 1.0.2 and above, but with the following
cannam@89	413 exception: 0.9.0 and above can correctly decompress multiple
cannam@89	414 concatenated compressed files. 0.1pl2 cannot do this; it will stop
cannam@89	415 after decompressing just the first file in the stream.
cannam@89	416
cannam@89	417 .I bzip2recover
cannam@89	418 versions prior to 1.0.2 used 32-bit integers to represent
cannam@89	419 bit positions in compressed files, so they could not handle compressed
cannam@89	420 files more than 512 megabytes long. Versions 1.0.2 and above use
cannam@89	421 64-bit ints on some platforms which support them (GNU supported
cannam@89	422 targets, and Windows). To establish whether or not bzip2recover was
cannam@89	423 built with such a limitation, run it without arguments. In any event
cannam@89	424 you can build yourself an unlimited version if you can recompile it
cannam@89	425 with MaybeUInt64 set to be an unsigned 64-bit integer.
cannam@89	426
cannam@89	427
cannam@89	428
cannam@89	429 .SH AUTHOR
cannam@89	430 Julian Seward, jsewardbzip.org.
cannam@89	431
cannam@89	432 http://www.bzip.org
cannam@89	433
cannam@89	434 The ideas embodied in
cannam@89	435 .I bzip2
cannam@89	436 are due to (at least) the following
cannam@89	437 people: Michael Burrows and David Wheeler (for the block sorting
cannam@89	438 transformation), David Wheeler (again, for the Huffman coder), Peter
cannam@89	439 Fenwick (for the structured coding model in the original
cannam@89	440 .I bzip,
cannam@89	441 and many refinements), and Alistair Moffat, Radford Neal and Ian Witten
cannam@89	442 (for the arithmetic coder in the original
cannam@89	443 .I bzip).
cannam@89	444 I am much
cannam@89	445 indebted for their help, support and advice. See the manual in the
cannam@89	446 source distribution for pointers to sources of documentation. Christian
cannam@89	447 von Roques encouraged me to look for faster sorting algorithms, so as to
cannam@89	448 speed up compression. Bela Lubkin encouraged me to improve the
cannam@89	449 worst-case compression performance.
cannam@89	450 Donna Robinson XMLised the documentation.
cannam@89	451 The bz* scripts are derived from those of GNU gzip.
cannam@89	452 Many people sent patches, helped
cannam@89	453 with portability problems, lent machines, gave advice and were generally
cannam@89	454 helpful.

Mercurial > hg > sv-dependency-builds

annotate src/bzip2-1.0.6/bzip2.1 @ 169:223a55898ab9 tip default