annotate src/bzip2-1.0.6/manual.html @ 83:ae30d91d2ffe

Replace these with versions built using an older toolset (so as to avoid ABI compatibilities when linking on Ubuntu 14.04 for packaging purposes)
author Chris Cannam
date Fri, 07 Feb 2020 11:51:13 +0000
parents e13257ea84a4
children
rev   line source
Chris@4 1 <html>
Chris@4 2 <head>
Chris@4 3 <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
Chris@4 4 <title>bzip2 and libbzip2, version 1.0.6</title>
Chris@4 5 <meta name="generator" content="DocBook XSL Stylesheets V1.75.2">
Chris@4 6 <style type="text/css" media="screen">/* Colours:
Chris@4 7 #74240f dark brown h1, h2, h3, h4
Chris@4 8 #336699 medium blue links
Chris@4 9 #339999 turquoise link hover colour
Chris@4 10 #202020 almost black general text
Chris@4 11 #761596 purple md5sum text
Chris@4 12 #626262 dark gray pre border
Chris@4 13 #eeeeee very light gray pre background
Chris@4 14 #f2f2f9 very light blue nav table background
Chris@4 15 #3366cc medium blue nav table border
Chris@4 16 */
Chris@4 17
Chris@4 18 a, a:link, a:visited, a:active { color: #336699; }
Chris@4 19 a:hover { color: #339999; }
Chris@4 20
Chris@4 21 body { font: 80%/126% sans-serif; }
Chris@4 22 h1, h2, h3, h4 { color: #74240f; }
Chris@4 23
Chris@4 24 dt { color: #336699; font-weight: bold }
Chris@4 25 dd {
Chris@4 26 margin-left: 1.5em;
Chris@4 27 padding-bottom: 0.8em;
Chris@4 28 }
Chris@4 29
Chris@4 30 /* -- ruler -- */
Chris@4 31 div.hr_blue {
Chris@4 32 height: 3px;
Chris@4 33 background:#ffffff url("/images/hr_blue.png") repeat-x; }
Chris@4 34 div.hr_blue hr { display:none; }
Chris@4 35
Chris@4 36 /* release styles */
Chris@4 37 #release p { margin-top: 0.4em; }
Chris@4 38 #release .md5sum { color: #761596; }
Chris@4 39
Chris@4 40
Chris@4 41 /* ------ styles for docs|manuals|howto ------ */
Chris@4 42 /* -- lists -- */
Chris@4 43 ul {
Chris@4 44 margin: 0px 4px 16px 16px;
Chris@4 45 padding: 0px;
Chris@4 46 list-style: url("/images/li-blue.png");
Chris@4 47 }
Chris@4 48 ul li {
Chris@4 49 margin-bottom: 10px;
Chris@4 50 }
Chris@4 51 ul ul {
Chris@4 52 list-style-type: none;
Chris@4 53 list-style-image: none;
Chris@4 54 margin-left: 0px;
Chris@4 55 }
Chris@4 56
Chris@4 57 /* header / footer nav tables */
Chris@4 58 table.nav {
Chris@4 59 border: solid 1px #3366cc;
Chris@4 60 background: #f2f2f9;
Chris@4 61 background-color: #f2f2f9;
Chris@4 62 margin-bottom: 0.5em;
Chris@4 63 }
Chris@4 64 /* don't have underlined links in chunked nav menus */
Chris@4 65 table.nav a { text-decoration: none; }
Chris@4 66 table.nav a:hover { text-decoration: underline; }
Chris@4 67 table.nav td { font-size: 85%; }
Chris@4 68
Chris@4 69 code, tt, pre { font-size: 120%; }
Chris@4 70 code, tt { color: #761596; }
Chris@4 71
Chris@4 72 div.literallayout, pre.programlisting, pre.screen {
Chris@4 73 color: #000000;
Chris@4 74 padding: 0.5em;
Chris@4 75 background: #eeeeee;
Chris@4 76 border: 1px solid #626262;
Chris@4 77 background-color: #eeeeee;
Chris@4 78 margin: 4px 0px 4px 0px;
Chris@4 79 }
Chris@4 80 </style>
Chris@4 81 </head>
Chris@4 82 <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div lang="en" class="book" title="bzip2 and libbzip2, version 1.0.6">
Chris@4 83 <div class="titlepage">
Chris@4 84 <div>
Chris@4 85 <div><h1 class="title">
Chris@4 86 <a name="userman"></a>bzip2 and libbzip2, version 1.0.6</h1></div>
Chris@4 87 <div><h2 class="subtitle">A program and library for data compression</h2></div>
Chris@4 88 <div><div class="authorgroup"><div class="author">
Chris@4 89 <h3 class="author">
Chris@4 90 <span class="firstname">Julian</span> <span class="surname">Seward</span>
Chris@4 91 </h3>
Chris@4 92 <div class="affiliation"><span class="orgname">http://www.bzip.org<br></span></div>
Chris@4 93 </div></div></div>
Chris@4 94 <div><p class="releaseinfo">Version 1.0.6 of 6 September 2010</p></div>
Chris@4 95 <div><p class="copyright">Copyright © 1996-2010 Julian Seward</p></div>
Chris@4 96 <div><div class="legalnotice" title="Legal Notice">
Chris@4 97 <a name="id537185"></a><p>This program, <code class="computeroutput">bzip2</code>, the
Chris@4 98 associated library <code class="computeroutput">libbzip2</code>, and
Chris@4 99 all documentation, are copyright © 1996-2010 Julian Seward.
Chris@4 100 All rights reserved.</p>
Chris@4 101 <p>Redistribution and use in source and binary forms, with
Chris@4 102 or without modification, are permitted provided that the
Chris@4 103 following conditions are met:</p>
Chris@4 104 <div class="itemizedlist"><ul class="itemizedlist" type="bullet">
Chris@4 105 <li class="listitem" style="list-style-type: disc"><p>Redistributions of source code must retain the
Chris@4 106 above copyright notice, this list of conditions and the
Chris@4 107 following disclaimer.</p></li>
Chris@4 108 <li class="listitem" style="list-style-type: disc"><p>The origin of this software must not be
Chris@4 109 misrepresented; you must not claim that you wrote the original
Chris@4 110 software. If you use this software in a product, an
Chris@4 111 acknowledgment in the product documentation would be
Chris@4 112 appreciated but is not required.</p></li>
Chris@4 113 <li class="listitem" style="list-style-type: disc"><p>Altered source versions must be plainly marked
Chris@4 114 as such, and must not be misrepresented as being the original
Chris@4 115 software.</p></li>
Chris@4 116 <li class="listitem" style="list-style-type: disc"><p>The name of the author may not be used to
Chris@4 117 endorse or promote products derived from this software without
Chris@4 118 specific prior written permission.</p></li>
Chris@4 119 </ul></div>
Chris@4 120 <p>THIS SOFTWARE IS PROVIDED BY THE AUTHOR "AS IS" AND ANY
Chris@4 121 EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO,
Chris@4 122 THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
Chris@4 123 PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
Chris@4 124 AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
Chris@4 125 EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
Chris@4 126 TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
Chris@4 127 DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
Chris@4 128 ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
Chris@4 129 LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
Chris@4 130 IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
Chris@4 131 THE POSSIBILITY OF SUCH DAMAGE.</p>
Chris@4 132 <p>PATENTS: To the best of my knowledge,
Chris@4 133 <code class="computeroutput">bzip2</code> and
Chris@4 134 <code class="computeroutput">libbzip2</code> do not use any patented
Chris@4 135 algorithms. However, I do not have the resources to carry
Chris@4 136 out a patent search. Therefore I cannot give any guarantee of
Chris@4 137 the above statement.
Chris@4 138 </p>
Chris@4 139 </div></div>
Chris@4 140 </div>
Chris@4 141 <hr>
Chris@4 142 </div>
Chris@4 143 <div class="toc">
Chris@4 144 <p><b>Table of Contents</b></p>
Chris@4 145 <dl>
Chris@4 146 <dt><span class="chapter"><a href="#intro">1. Introduction</a></span></dt>
Chris@4 147 <dt><span class="chapter"><a href="#using">2. How to use bzip2</a></span></dt>
Chris@4 148 <dd><dl>
Chris@4 149 <dt><span class="sect1"><a href="#name">2.1. NAME</a></span></dt>
Chris@4 150 <dt><span class="sect1"><a href="#synopsis">2.2. SYNOPSIS</a></span></dt>
Chris@4 151 <dt><span class="sect1"><a href="#description">2.3. DESCRIPTION</a></span></dt>
Chris@4 152 <dt><span class="sect1"><a href="#options">2.4. OPTIONS</a></span></dt>
Chris@4 153 <dt><span class="sect1"><a href="#memory-management">2.5. MEMORY MANAGEMENT</a></span></dt>
Chris@4 154 <dt><span class="sect1"><a href="#recovering">2.6. RECOVERING DATA FROM DAMAGED FILES</a></span></dt>
Chris@4 155 <dt><span class="sect1"><a href="#performance">2.7. PERFORMANCE NOTES</a></span></dt>
Chris@4 156 <dt><span class="sect1"><a href="#caveats">2.8. CAVEATS</a></span></dt>
Chris@4 157 <dt><span class="sect1"><a href="#author">2.9. AUTHOR</a></span></dt>
Chris@4 158 </dl></dd>
Chris@4 159 <dt><span class="chapter"><a href="#libprog">3.
Chris@4 160 Programming with <code class="computeroutput">libbzip2</code>
Chris@4 161 </a></span></dt>
Chris@4 162 <dd><dl>
Chris@4 163 <dt><span class="sect1"><a href="#top-level">3.1. Top-level structure</a></span></dt>
Chris@4 164 <dd><dl>
Chris@4 165 <dt><span class="sect2"><a href="#ll-summary">3.1.1. Low-level summary</a></span></dt>
Chris@4 166 <dt><span class="sect2"><a href="#hl-summary">3.1.2. High-level summary</a></span></dt>
Chris@4 167 <dt><span class="sect2"><a href="#util-fns-summary">3.1.3. Utility functions summary</a></span></dt>
Chris@4 168 </dl></dd>
Chris@4 169 <dt><span class="sect1"><a href="#err-handling">3.2. Error handling</a></span></dt>
Chris@4 170 <dt><span class="sect1"><a href="#low-level">3.3. Low-level interface</a></span></dt>
Chris@4 171 <dd><dl>
Chris@4 172 <dt><span class="sect2"><a href="#bzcompress-init">3.3.1. BZ2_bzCompressInit</a></span></dt>
Chris@4 173 <dt><span class="sect2"><a href="#bzCompress">3.3.2. BZ2_bzCompress</a></span></dt>
Chris@4 174 <dt><span class="sect2"><a href="#bzCompress-end">3.3.3. BZ2_bzCompressEnd</a></span></dt>
Chris@4 175 <dt><span class="sect2"><a href="#bzDecompress-init">3.3.4. BZ2_bzDecompressInit</a></span></dt>
Chris@4 176 <dt><span class="sect2"><a href="#bzDecompress">3.3.5. BZ2_bzDecompress</a></span></dt>
Chris@4 177 <dt><span class="sect2"><a href="#bzDecompress-end">3.3.6. BZ2_bzDecompressEnd</a></span></dt>
Chris@4 178 </dl></dd>
Chris@4 179 <dt><span class="sect1"><a href="#hl-interface">3.4. High-level interface</a></span></dt>
Chris@4 180 <dd><dl>
Chris@4 181 <dt><span class="sect2"><a href="#bzreadopen">3.4.1. BZ2_bzReadOpen</a></span></dt>
Chris@4 182 <dt><span class="sect2"><a href="#bzread">3.4.2. BZ2_bzRead</a></span></dt>
Chris@4 183 <dt><span class="sect2"><a href="#bzreadgetunused">3.4.3. BZ2_bzReadGetUnused</a></span></dt>
Chris@4 184 <dt><span class="sect2"><a href="#bzreadclose">3.4.4. BZ2_bzReadClose</a></span></dt>
Chris@4 185 <dt><span class="sect2"><a href="#bzwriteopen">3.4.5. BZ2_bzWriteOpen</a></span></dt>
Chris@4 186 <dt><span class="sect2"><a href="#bzwrite">3.4.6. BZ2_bzWrite</a></span></dt>
Chris@4 187 <dt><span class="sect2"><a href="#bzwriteclose">3.4.7. BZ2_bzWriteClose</a></span></dt>
Chris@4 188 <dt><span class="sect2"><a href="#embed">3.4.8. Handling embedded compressed data streams</a></span></dt>
Chris@4 189 <dt><span class="sect2"><a href="#std-rdwr">3.4.9. Standard file-reading/writing code</a></span></dt>
Chris@4 190 </dl></dd>
Chris@4 191 <dt><span class="sect1"><a href="#util-fns">3.5. Utility functions</a></span></dt>
Chris@4 192 <dd><dl>
Chris@4 193 <dt><span class="sect2"><a href="#bzbufftobuffcompress">3.5.1. BZ2_bzBuffToBuffCompress</a></span></dt>
Chris@4 194 <dt><span class="sect2"><a href="#bzbufftobuffdecompress">3.5.2. BZ2_bzBuffToBuffDecompress</a></span></dt>
Chris@4 195 </dl></dd>
Chris@4 196 <dt><span class="sect1"><a href="#zlib-compat">3.6. zlib compatibility functions</a></span></dt>
Chris@4 197 <dt><span class="sect1"><a href="#stdio-free">3.7. Using the library in a stdio-free environment</a></span></dt>
Chris@4 198 <dd><dl>
Chris@4 199 <dt><span class="sect2"><a href="#stdio-bye">3.7.1. Getting rid of stdio</a></span></dt>
Chris@4 200 <dt><span class="sect2"><a href="#critical-error">3.7.2. Critical error handling</a></span></dt>
Chris@4 201 </dl></dd>
Chris@4 202 <dt><span class="sect1"><a href="#win-dll">3.8. Making a Windows DLL</a></span></dt>
Chris@4 203 </dl></dd>
Chris@4 204 <dt><span class="chapter"><a href="#misc">4. Miscellanea</a></span></dt>
Chris@4 205 <dd><dl>
Chris@4 206 <dt><span class="sect1"><a href="#limits">4.1. Limitations of the compressed file format</a></span></dt>
Chris@4 207 <dt><span class="sect1"><a href="#port-issues">4.2. Portability issues</a></span></dt>
Chris@4 208 <dt><span class="sect1"><a href="#bugs">4.3. Reporting bugs</a></span></dt>
Chris@4 209 <dt><span class="sect1"><a href="#package">4.4. Did you get the right package?</a></span></dt>
Chris@4 210 <dt><span class="sect1"><a href="#reading">4.5. Further Reading</a></span></dt>
Chris@4 211 </dl></dd>
Chris@4 212 </dl>
Chris@4 213 </div>
Chris@4 214 <div class="chapter" title="1. Introduction">
Chris@4 215 <div class="titlepage"><div><div><h2 class="title">
Chris@4 216 <a name="intro"></a>1. Introduction</h2></div></div></div>
Chris@4 217 <p><code class="computeroutput">bzip2</code> compresses files
Chris@4 218 using the Burrows-Wheeler block-sorting text compression
Chris@4 219 algorithm, and Huffman coding. Compression is generally
Chris@4 220 considerably better than that achieved by more conventional
Chris@4 221 LZ77/LZ78-based compressors, and approaches the performance of
Chris@4 222 the PPM family of statistical compressors.</p>
Chris@4 223 <p><code class="computeroutput">bzip2</code> is built on top of
Chris@4 224 <code class="computeroutput">libbzip2</code>, a flexible library for
Chris@4 225 handling compressed data in the
Chris@4 226 <code class="computeroutput">bzip2</code> format. This manual
Chris@4 227 describes both how to use the program and how to work with the
Chris@4 228 library interface. Most of the manual is devoted to this
Chris@4 229 library, not the program, which is good news if your interest is
Chris@4 230 only in the program.</p>
Chris@4 231 <div class="itemizedlist"><ul class="itemizedlist" type="bullet">
Chris@4 232 <li class="listitem" style="list-style-type: disc"><p><a class="xref" href="#using" title="2. How to use bzip2">How to use bzip2</a> describes how to use
Chris@4 233 <code class="computeroutput">bzip2</code>; this is the only part
Chris@4 234 you need to read if you just want to know how to operate the
Chris@4 235 program.</p></li>
Chris@4 236 <li class="listitem" style="list-style-type: disc"><p><a class="xref" href="#libprog" title="3.  Programming with libbzip2">Programming with libbzip2</a> describes the
Chris@4 237 programming interfaces in detail, and</p></li>
Chris@4 238 <li class="listitem" style="list-style-type: disc"><p><a class="xref" href="#misc" title="4. Miscellanea">Miscellanea</a> records some
Chris@4 239 miscellaneous notes which I thought ought to be recorded
Chris@4 240 somewhere.</p></li>
Chris@4 241 </ul></div>
Chris@4 242 </div>
Chris@4 243 <div class="chapter" title="2. How to use bzip2">
Chris@4 244 <div class="titlepage"><div><div><h2 class="title">
Chris@4 245 <a name="using"></a>2. How to use bzip2</h2></div></div></div>
Chris@4 246 <div class="toc">
Chris@4 247 <p><b>Table of Contents</b></p>
Chris@4 248 <dl>
Chris@4 249 <dt><span class="sect1"><a href="#name">2.1. NAME</a></span></dt>
Chris@4 250 <dt><span class="sect1"><a href="#synopsis">2.2. SYNOPSIS</a></span></dt>
Chris@4 251 <dt><span class="sect1"><a href="#description">2.3. DESCRIPTION</a></span></dt>
Chris@4 252 <dt><span class="sect1"><a href="#options">2.4. OPTIONS</a></span></dt>
Chris@4 253 <dt><span class="sect1"><a href="#memory-management">2.5. MEMORY MANAGEMENT</a></span></dt>
Chris@4 254 <dt><span class="sect1"><a href="#recovering">2.6. RECOVERING DATA FROM DAMAGED FILES</a></span></dt>
Chris@4 255 <dt><span class="sect1"><a href="#performance">2.7. PERFORMANCE NOTES</a></span></dt>
Chris@4 256 <dt><span class="sect1"><a href="#caveats">2.8. CAVEATS</a></span></dt>
Chris@4 257 <dt><span class="sect1"><a href="#author">2.9. AUTHOR</a></span></dt>
Chris@4 258 </dl>
Chris@4 259 </div>
Chris@4 260 <p>This chapter contains a copy of the
Chris@4 261 <code class="computeroutput">bzip2</code> man page, and nothing
Chris@4 262 else.</p>
Chris@4 263 <div class="sect1" title="2.1. NAME">
Chris@4 264 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
Chris@4 265 <a name="name"></a>2.1. NAME</h2></div></div></div>
Chris@4 266 <div class="itemizedlist"><ul class="itemizedlist" type="bullet">
Chris@4 267 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">bzip2</code>,
Chris@4 268 <code class="computeroutput">bunzip2</code> - a block-sorting file
Chris@4 269 compressor, v1.0.6</p></li>
Chris@4 270 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">bzcat</code> -
Chris@4 271 decompresses files to stdout</p></li>
Chris@4 272 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">bzip2recover</code> -
Chris@4 273 recovers data from damaged bzip2 files</p></li>
Chris@4 274 </ul></div>
Chris@4 275 </div>
Chris@4 276 <div class="sect1" title="2.2. SYNOPSIS">
Chris@4 277 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
Chris@4 278 <a name="synopsis"></a>2.2. SYNOPSIS</h2></div></div></div>
Chris@4 279 <div class="itemizedlist"><ul class="itemizedlist" type="bullet">
Chris@4 280 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">bzip2</code> [
Chris@4 281 -cdfkqstvzVL123456789 ] [ filenames ... ]</p></li>
Chris@4 282 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">bunzip2</code> [
Chris@4 283 -fkvsVL ] [ filenames ... ]</p></li>
Chris@4 284 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">bzcat</code> [ -s ] [
Chris@4 285 filenames ... ]</p></li>
Chris@4 286 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">bzip2recover</code>
Chris@4 287 filename</p></li>
Chris@4 288 </ul></div>
Chris@4 289 </div>
Chris@4 290 <div class="sect1" title="2.3. DESCRIPTION">
Chris@4 291 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
Chris@4 292 <a name="description"></a>2.3. DESCRIPTION</h2></div></div></div>
Chris@4 293 <p><code class="computeroutput">bzip2</code> compresses files
Chris@4 294 using the Burrows-Wheeler block sorting text compression
Chris@4 295 algorithm, and Huffman coding. Compression is generally
Chris@4 296 considerably better than that achieved by more conventional
Chris@4 297 LZ77/LZ78-based compressors, and approaches the performance of
Chris@4 298 the PPM family of statistical compressors.</p>
Chris@4 299 <p>The command-line options are deliberately very similar to
Chris@4 300 those of GNU <code class="computeroutput">gzip</code>, but they are
Chris@4 301 not identical.</p>
Chris@4 302 <p><code class="computeroutput">bzip2</code> expects a list of
Chris@4 303 file names to accompany the command-line flags. Each file is
Chris@4 304 replaced by a compressed version of itself, with the name
Chris@4 305 <code class="computeroutput">original_name.bz2</code>. Each
Chris@4 306 compressed file has the same modification date, permissions, and,
Chris@4 307 when possible, ownership as the corresponding original, so that
Chris@4 308 these properties can be correctly restored at decompression time.
Chris@4 309 File name handling is naive in the sense that there is no
Chris@4 310 mechanism for preserving original file names, permissions,
Chris@4 311 ownerships or dates in filesystems which lack these concepts, or
Chris@4 312 have serious file name length restrictions, such as
Chris@4 313 MS-DOS.</p>
Chris@4 314 <p><code class="computeroutput">bzip2</code> and
Chris@4 315 <code class="computeroutput">bunzip2</code> will by default not
Chris@4 316 overwrite existing files. If you want this to happen, specify
Chris@4 317 the <code class="computeroutput">-f</code> flag.</p>
Chris@4 318 <p>If no file names are specified,
Chris@4 319 <code class="computeroutput">bzip2</code> compresses from standard
Chris@4 320 input to standard output. In this case,
Chris@4 321 <code class="computeroutput">bzip2</code> will decline to write
Chris@4 322 compressed output to a terminal, as this would be entirely
Chris@4 323 incomprehensible and therefore pointless.</p>
Chris@4 324 <p><code class="computeroutput">bunzip2</code> (or
Chris@4 325 <code class="computeroutput">bzip2 -d</code>) decompresses all
Chris@4 326 specified files. Files which were not created by
Chris@4 327 <code class="computeroutput">bzip2</code> will be detected and
Chris@4 328 ignored, and a warning issued.
Chris@4 329 <code class="computeroutput">bzip2</code> attempts to guess the
Chris@4 330 filename for the decompressed file from that of the compressed
Chris@4 331 file as follows:</p>
Chris@4 332 <div class="itemizedlist"><ul class="itemizedlist" type="bullet">
Chris@4 333 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">filename.bz2 </code>
Chris@4 334 becomes
Chris@4 335 <code class="computeroutput">filename</code></p></li>
Chris@4 336 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">filename.bz </code>
Chris@4 337 becomes
Chris@4 338 <code class="computeroutput">filename</code></p></li>
Chris@4 339 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">filename.tbz2</code>
Chris@4 340 becomes
Chris@4 341 <code class="computeroutput">filename.tar</code></p></li>
Chris@4 342 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">filename.tbz </code>
Chris@4 343 becomes
Chris@4 344 <code class="computeroutput">filename.tar</code></p></li>
Chris@4 345 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">anyothername </code>
Chris@4 346 becomes
Chris@4 347 <code class="computeroutput">anyothername.out</code></p></li>
Chris@4 348 </ul></div>
Chris@4 349 <p>If the file does not end in one of the recognised endings,
Chris@4 350 <code class="computeroutput">.bz2</code>,
Chris@4 351 <code class="computeroutput">.bz</code>,
Chris@4 352 <code class="computeroutput">.tbz2</code> or
Chris@4 353 <code class="computeroutput">.tbz</code>,
Chris@4 354 <code class="computeroutput">bzip2</code> complains that it cannot
Chris@4 355 guess the name of the original file, and uses the original name
Chris@4 356 with <code class="computeroutput">.out</code> appended.</p>
Chris@4 357 <p>As with compression, supplying no filenames causes
Chris@4 358 decompression from standard input to standard output.</p>
Chris@4 359 <p><code class="computeroutput">bunzip2</code> will correctly
Chris@4 360 decompress a file which is the concatenation of two or more
Chris@4 361 compressed files. The result is the concatenation of the
Chris@4 362 corresponding uncompressed files. Integrity testing
Chris@4 363 (<code class="computeroutput">-t</code>) of concatenated compressed
Chris@4 364 files is also supported.</p>
Chris@4 365 <p>You can also compress or decompress files to the standard
Chris@4 366 output by giving the <code class="computeroutput">-c</code> flag.
Chris@4 367 Multiple files may be compressed and decompressed like this. The
Chris@4 368 resulting outputs are fed sequentially to stdout. Compression of
Chris@4 369 multiple files in this manner generates a stream containing
Chris@4 370 multiple compressed file representations. Such a stream can be
Chris@4 371 decompressed correctly only by
Chris@4 372 <code class="computeroutput">bzip2</code> version 0.9.0 or later.
Chris@4 373 Earlier versions of <code class="computeroutput">bzip2</code> will
Chris@4 374 stop after decompressing the first file in the stream.</p>
Chris@4 375 <p><code class="computeroutput">bzcat</code> (or
Chris@4 376 <code class="computeroutput">bzip2 -dc</code>) decompresses all
Chris@4 377 specified files to the standard output.</p>
Chris@4 378 <p><code class="computeroutput">bzip2</code> will read arguments
Chris@4 379 from the environment variables
Chris@4 380 <code class="computeroutput">BZIP2</code> and
Chris@4 381 <code class="computeroutput">BZIP</code>, in that order, and will
Chris@4 382 process them before any arguments read from the command line.
Chris@4 383 This gives a convenient way to supply default arguments.</p>
Chris@4 384 <p>Compression is always performed, even if the compressed
Chris@4 385 file is slightly larger than the original. Files of less than
Chris@4 386 about one hundred bytes tend to get larger, since the compression
Chris@4 387 mechanism has a constant overhead in the region of 50 bytes.
Chris@4 388 Random data (including the output of most file compressors) is
Chris@4 389 coded at about 8.05 bits per byte, giving an expansion of around
Chris@4 390 0.5%.</p>
Chris@4 391 <p>As a self-check for your protection,
Chris@4 392 <code class="computeroutput">bzip2</code> uses 32-bit CRCs to make
Chris@4 393 sure that the decompressed version of a file is identical to the
Chris@4 394 original. This guards against corruption of the compressed data,
Chris@4 395 and against undetected bugs in
Chris@4 396 <code class="computeroutput">bzip2</code> (hopefully very unlikely).
Chris@4 397 The chances of data corruption going undetected is microscopic,
Chris@4 398 about one chance in four billion for each file processed. Be
Chris@4 399 aware, though, that the check occurs upon decompression, so it
Chris@4 400 can only tell you that something is wrong. It can't help you
Chris@4 401 recover the original uncompressed data. You can use
Chris@4 402 <code class="computeroutput">bzip2recover</code> to try to recover
Chris@4 403 data from damaged files.</p>
Chris@4 404 <p>Return values: 0 for a normal exit, 1 for environmental
Chris@4 405 problems (file not found, invalid flags, I/O errors, etc.), 2
Chris@4 406 to indicate a corrupt compressed file, 3 for an internal
Chris@4 407 consistency error (eg, bug) which caused
Chris@4 408 <code class="computeroutput">bzip2</code> to panic.</p>
Chris@4 409 </div>
Chris@4 410 <div class="sect1" title="2.4. OPTIONS">
Chris@4 411 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
Chris@4 412 <a name="options"></a>2.4. OPTIONS</h2></div></div></div>
Chris@4 413 <div class="variablelist"><dl>
Chris@4 414 <dt><span class="term"><code class="computeroutput">-c --stdout</code></span></dt>
Chris@4 415 <dd><p>Compress or decompress to standard
Chris@4 416 output.</p></dd>
Chris@4 417 <dt><span class="term"><code class="computeroutput">-d --decompress</code></span></dt>
Chris@4 418 <dd><p>Force decompression.
Chris@4 419 <code class="computeroutput">bzip2</code>,
Chris@4 420 <code class="computeroutput">bunzip2</code> and
Chris@4 421 <code class="computeroutput">bzcat</code> are really the same
Chris@4 422 program, and the decision about what actions to take is done on
Chris@4 423 the basis of which name is used. This flag overrides that
Chris@4 424 mechanism, and forces bzip2 to decompress.</p></dd>
Chris@4 425 <dt><span class="term"><code class="computeroutput">-z --compress</code></span></dt>
Chris@4 426 <dd><p>The complement to
Chris@4 427 <code class="computeroutput">-d</code>: forces compression,
Chris@4 428 regardless of the invokation name.</p></dd>
Chris@4 429 <dt><span class="term"><code class="computeroutput">-t --test</code></span></dt>
Chris@4 430 <dd><p>Check integrity of the specified file(s), but
Chris@4 431 don't decompress them. This really performs a trial
Chris@4 432 decompression and throws away the result.</p></dd>
Chris@4 433 <dt><span class="term"><code class="computeroutput">-f --force</code></span></dt>
Chris@4 434 <dd>
Chris@4 435 <p>Force overwrite of output files. Normally,
Chris@4 436 <code class="computeroutput">bzip2</code> will not overwrite
Chris@4 437 existing output files. Also forces
Chris@4 438 <code class="computeroutput">bzip2</code> to break hard links to
Chris@4 439 files, which it otherwise wouldn't do.</p>
Chris@4 440 <p><code class="computeroutput">bzip2</code> normally declines
Chris@4 441 to decompress files which don't have the correct magic header
Chris@4 442 bytes. If forced (<code class="computeroutput">-f</code>),
Chris@4 443 however, it will pass such files through unmodified. This is
Chris@4 444 how GNU <code class="computeroutput">gzip</code> behaves.</p>
Chris@4 445 </dd>
Chris@4 446 <dt><span class="term"><code class="computeroutput">-k --keep</code></span></dt>
Chris@4 447 <dd><p>Keep (don't delete) input files during
Chris@4 448 compression or decompression.</p></dd>
Chris@4 449 <dt><span class="term"><code class="computeroutput">-s --small</code></span></dt>
Chris@4 450 <dd>
Chris@4 451 <p>Reduce memory usage, for compression,
Chris@4 452 decompression and testing. Files are decompressed and tested
Chris@4 453 using a modified algorithm which only requires 2.5 bytes per
Chris@4 454 block byte. This means any file can be decompressed in 2300k
Chris@4 455 of memory, albeit at about half the normal speed.</p>
Chris@4 456 <p>During compression, <code class="computeroutput">-s</code>
Chris@4 457 selects a block size of 200k, which limits memory use to around
Chris@4 458 the same figure, at the expense of your compression ratio. In
Chris@4 459 short, if your machine is low on memory (8 megabytes or less),
Chris@4 460 use <code class="computeroutput">-s</code> for everything. See
Chris@4 461 <a class="xref" href="#memory-management" title="2.5. MEMORY MANAGEMENT">MEMORY MANAGEMENT</a> below.</p>
Chris@4 462 </dd>
Chris@4 463 <dt><span class="term"><code class="computeroutput">-q --quiet</code></span></dt>
Chris@4 464 <dd><p>Suppress non-essential warning messages.
Chris@4 465 Messages pertaining to I/O errors and other critical events
Chris@4 466 will not be suppressed.</p></dd>
Chris@4 467 <dt><span class="term"><code class="computeroutput">-v --verbose</code></span></dt>
Chris@4 468 <dd><p>Verbose mode -- show the compression ratio for
Chris@4 469 each file processed. Further
Chris@4 470 <code class="computeroutput">-v</code>'s increase the verbosity
Chris@4 471 level, spewing out lots of information which is primarily of
Chris@4 472 interest for diagnostic purposes.</p></dd>
Chris@4 473 <dt><span class="term"><code class="computeroutput">-L --license -V --version</code></span></dt>
Chris@4 474 <dd><p>Display the software version, license terms and
Chris@4 475 conditions.</p></dd>
Chris@4 476 <dt><span class="term"><code class="computeroutput">-1</code> (or
Chris@4 477 <code class="computeroutput">--fast</code>) to
Chris@4 478 <code class="computeroutput">-9</code> (or
Chris@4 479 <code class="computeroutput">-best</code>)</span></dt>
Chris@4 480 <dd><p>Set the block size to 100 k, 200 k ... 900 k
Chris@4 481 when compressing. Has no effect when decompressing. See <a class="xref" href="#memory-management" title="2.5. MEMORY MANAGEMENT">MEMORY MANAGEMENT</a> below. The
Chris@4 482 <code class="computeroutput">--fast</code> and
Chris@4 483 <code class="computeroutput">--best</code> aliases are primarily
Chris@4 484 for GNU <code class="computeroutput">gzip</code> compatibility.
Chris@4 485 In particular, <code class="computeroutput">--fast</code> doesn't
Chris@4 486 make things significantly faster. And
Chris@4 487 <code class="computeroutput">--best</code> merely selects the
Chris@4 488 default behaviour.</p></dd>
Chris@4 489 <dt><span class="term"><code class="computeroutput">--</code></span></dt>
Chris@4 490 <dd><p>Treats all subsequent arguments as file names,
Chris@4 491 even if they start with a dash. This is so you can handle
Chris@4 492 files with names beginning with a dash, for example:
Chris@4 493 <code class="computeroutput">bzip2 --
Chris@4 494 -myfilename</code>.</p></dd>
Chris@4 495 <dt>
Chris@4 496 <span class="term"><code class="computeroutput">--repetitive-fast</code>, </span><span class="term"><code class="computeroutput">--repetitive-best</code></span>
Chris@4 497 </dt>
Chris@4 498 <dd><p>These flags are redundant in versions 0.9.5 and
Chris@4 499 above. They provided some coarse control over the behaviour of
Chris@4 500 the sorting algorithm in earlier versions, which was sometimes
Chris@4 501 useful. 0.9.5 and above have an improved algorithm which
Chris@4 502 renders these flags irrelevant.</p></dd>
Chris@4 503 </dl></div>
Chris@4 504 </div>
Chris@4 505 <div class="sect1" title="2.5. MEMORY MANAGEMENT">
Chris@4 506 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
Chris@4 507 <a name="memory-management"></a>2.5. MEMORY MANAGEMENT</h2></div></div></div>
Chris@4 508 <p><code class="computeroutput">bzip2</code> compresses large
Chris@4 509 files in blocks. The block size affects both the compression
Chris@4 510 ratio achieved, and the amount of memory needed for compression
Chris@4 511 and decompression. The flags <code class="computeroutput">-1</code>
Chris@4 512 through <code class="computeroutput">-9</code> specify the block
Chris@4 513 size to be 100,000 bytes through 900,000 bytes (the default)
Chris@4 514 respectively. At decompression time, the block size used for
Chris@4 515 compression is read from the header of the compressed file, and
Chris@4 516 <code class="computeroutput">bunzip2</code> then allocates itself
Chris@4 517 just enough memory to decompress the file. Since block sizes are
Chris@4 518 stored in compressed files, it follows that the flags
Chris@4 519 <code class="computeroutput">-1</code> to
Chris@4 520 <code class="computeroutput">-9</code> are irrelevant to and so
Chris@4 521 ignored during decompression.</p>
Chris@4 522 <p>Compression and decompression requirements, in bytes, can be
Chris@4 523 estimated as:</p>
Chris@4 524 <pre class="programlisting">Compression: 400k + ( 8 x block size )
Chris@4 525
Chris@4 526 Decompression: 100k + ( 4 x block size ), or
Chris@4 527 100k + ( 2.5 x block size )</pre>
Chris@4 528 <p>Larger block sizes give rapidly diminishing marginal
Chris@4 529 returns. Most of the compression comes from the first two or
Chris@4 530 three hundred k of block size, a fact worth bearing in mind when
Chris@4 531 using <code class="computeroutput">bzip2</code> on small machines.
Chris@4 532 It is also important to appreciate that the decompression memory
Chris@4 533 requirement is set at compression time by the choice of block
Chris@4 534 size.</p>
Chris@4 535 <p>For files compressed with the default 900k block size,
Chris@4 536 <code class="computeroutput">bunzip2</code> will require about 3700
Chris@4 537 kbytes to decompress. To support decompression of any file on a
Chris@4 538 4 megabyte machine, <code class="computeroutput">bunzip2</code> has
Chris@4 539 an option to decompress using approximately half this amount of
Chris@4 540 memory, about 2300 kbytes. Decompression speed is also halved,
Chris@4 541 so you should use this option only where necessary. The relevant
Chris@4 542 flag is <code class="computeroutput">-s</code>.</p>
Chris@4 543 <p>In general, try and use the largest block size memory
Chris@4 544 constraints allow, since that maximises the compression achieved.
Chris@4 545 Compression and decompression speed are virtually unaffected by
Chris@4 546 block size.</p>
Chris@4 547 <p>Another significant point applies to files which fit in a
Chris@4 548 single block -- that means most files you'd encounter using a
Chris@4 549 large block size. The amount of real memory touched is
Chris@4 550 proportional to the size of the file, since the file is smaller
Chris@4 551 than a block. For example, compressing a file 20,000 bytes long
Chris@4 552 with the flag <code class="computeroutput">-9</code> will cause the
Chris@4 553 compressor to allocate around 7600k of memory, but only touch
Chris@4 554 400k + 20000 * 8 = 560 kbytes of it. Similarly, the decompressor
Chris@4 555 will allocate 3700k but only touch 100k + 20000 * 4 = 180
Chris@4 556 kbytes.</p>
Chris@4 557 <p>Here is a table which summarises the maximum memory usage
Chris@4 558 for different block sizes. Also recorded is the total compressed
Chris@4 559 size for 14 files of the Calgary Text Compression Corpus
Chris@4 560 totalling 3,141,622 bytes. This column gives some feel for how
Chris@4 561 compression varies with block size. These figures tend to
Chris@4 562 understate the advantage of larger block sizes for larger files,
Chris@4 563 since the Corpus is dominated by smaller files.</p>
Chris@4 564 <pre class="programlisting"> Compress Decompress Decompress Corpus
Chris@4 565 Flag usage usage -s usage Size
Chris@4 566
Chris@4 567 -1 1200k 500k 350k 914704
Chris@4 568 -2 2000k 900k 600k 877703
Chris@4 569 -3 2800k 1300k 850k 860338
Chris@4 570 -4 3600k 1700k 1100k 846899
Chris@4 571 -5 4400k 2100k 1350k 845160
Chris@4 572 -6 5200k 2500k 1600k 838626
Chris@4 573 -7 6100k 2900k 1850k 834096
Chris@4 574 -8 6800k 3300k 2100k 828642
Chris@4 575 -9 7600k 3700k 2350k 828642</pre>
Chris@4 576 </div>
Chris@4 577 <div class="sect1" title="2.6. RECOVERING DATA FROM DAMAGED FILES">
Chris@4 578 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
Chris@4 579 <a name="recovering"></a>2.6. RECOVERING DATA FROM DAMAGED FILES</h2></div></div></div>
Chris@4 580 <p><code class="computeroutput">bzip2</code> compresses files in
Chris@4 581 blocks, usually 900kbytes long. Each block is handled
Chris@4 582 independently. If a media or transmission error causes a
Chris@4 583 multi-block <code class="computeroutput">.bz2</code> file to become
Chris@4 584 damaged, it may be possible to recover data from the undamaged
Chris@4 585 blocks in the file.</p>
Chris@4 586 <p>The compressed representation of each block is delimited by
Chris@4 587 a 48-bit pattern, which makes it possible to find the block
Chris@4 588 boundaries with reasonable certainty. Each block also carries
Chris@4 589 its own 32-bit CRC, so damaged blocks can be distinguished from
Chris@4 590 undamaged ones.</p>
Chris@4 591 <p><code class="computeroutput">bzip2recover</code> is a simple
Chris@4 592 program whose purpose is to search for blocks in
Chris@4 593 <code class="computeroutput">.bz2</code> files, and write each block
Chris@4 594 out into its own <code class="computeroutput">.bz2</code> file. You
Chris@4 595 can then use <code class="computeroutput">bzip2 -t</code> to test
Chris@4 596 the integrity of the resulting files, and decompress those which
Chris@4 597 are undamaged.</p>
Chris@4 598 <p><code class="computeroutput">bzip2recover</code> takes a
Chris@4 599 single argument, the name of the damaged file, and writes a
Chris@4 600 number of files <code class="computeroutput">rec0001file.bz2</code>,
Chris@4 601 <code class="computeroutput">rec0002file.bz2</code>, etc, containing
Chris@4 602 the extracted blocks. The output filenames are designed so that
Chris@4 603 the use of wildcards in subsequent processing -- for example,
Chris@4 604 <code class="computeroutput">bzip2 -dc rec*file.bz2 &gt;
Chris@4 605 recovered_data</code> -- lists the files in the correct
Chris@4 606 order.</p>
Chris@4 607 <p><code class="computeroutput">bzip2recover</code> should be of
Chris@4 608 most use dealing with large <code class="computeroutput">.bz2</code>
Chris@4 609 files, as these will contain many blocks. It is clearly futile
Chris@4 610 to use it on damaged single-block files, since a damaged block
Chris@4 611 cannot be recovered. If you wish to minimise any potential data
Chris@4 612 loss through media or transmission errors, you might consider
Chris@4 613 compressing with a smaller block size.</p>
Chris@4 614 </div>
Chris@4 615 <div class="sect1" title="2.7. PERFORMANCE NOTES">
Chris@4 616 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
Chris@4 617 <a name="performance"></a>2.7. PERFORMANCE NOTES</h2></div></div></div>
Chris@4 618 <p>The sorting phase of compression gathers together similar
Chris@4 619 strings in the file. Because of this, files containing very long
Chris@4 620 runs of repeated symbols, like "aabaabaabaab ..." (repeated
Chris@4 621 several hundred times) may compress more slowly than normal.
Chris@4 622 Versions 0.9.5 and above fare much better than previous versions
Chris@4 623 in this respect. The ratio between worst-case and average-case
Chris@4 624 compression time is in the region of 10:1. For previous
Chris@4 625 versions, this figure was more like 100:1. You can use the
Chris@4 626 <code class="computeroutput">-vvvv</code> option to monitor progress
Chris@4 627 in great detail, if you want.</p>
Chris@4 628 <p>Decompression speed is unaffected by these
Chris@4 629 phenomena.</p>
Chris@4 630 <p><code class="computeroutput">bzip2</code> usually allocates
Chris@4 631 several megabytes of memory to operate in, and then charges all
Chris@4 632 over it in a fairly random fashion. This means that performance,
Chris@4 633 both for compressing and decompressing, is largely determined by
Chris@4 634 the speed at which your machine can service cache misses.
Chris@4 635 Because of this, small changes to the code to reduce the miss
Chris@4 636 rate have been observed to give disproportionately large
Chris@4 637 performance improvements. I imagine
Chris@4 638 <code class="computeroutput">bzip2</code> will perform best on
Chris@4 639 machines with very large caches.</p>
Chris@4 640 </div>
Chris@4 641 <div class="sect1" title="2.8. CAVEATS">
Chris@4 642 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
Chris@4 643 <a name="caveats"></a>2.8. CAVEATS</h2></div></div></div>
Chris@4 644 <p>I/O error messages are not as helpful as they could be.
Chris@4 645 <code class="computeroutput">bzip2</code> tries hard to detect I/O
Chris@4 646 errors and exit cleanly, but the details of what the problem is
Chris@4 647 sometimes seem rather misleading.</p>
Chris@4 648 <p>This manual page pertains to version 1.0.6 of
Chris@4 649 <code class="computeroutput">bzip2</code>. Compressed data created by
Chris@4 650 this version is entirely forwards and backwards compatible with the
Chris@4 651 previous public releases, versions 0.1pl2, 0.9.0 and 0.9.5, 1.0.0,
Chris@4 652 1.0.1, 1.0.2 and 1.0.3, but with the following exception: 0.9.0 and
Chris@4 653 above can correctly decompress multiple concatenated compressed files.
Chris@4 654 0.1pl2 cannot do this; it will stop after decompressing just the first
Chris@4 655 file in the stream.</p>
Chris@4 656 <p><code class="computeroutput">bzip2recover</code> versions
Chris@4 657 prior to 1.0.2 used 32-bit integers to represent bit positions in
Chris@4 658 compressed files, so it could not handle compressed files more
Chris@4 659 than 512 megabytes long. Versions 1.0.2 and above use 64-bit ints
Chris@4 660 on some platforms which support them (GNU supported targets, and
Chris@4 661 Windows). To establish whether or not
Chris@4 662 <code class="computeroutput">bzip2recover</code> was built with such
Chris@4 663 a limitation, run it without arguments. In any event you can
Chris@4 664 build yourself an unlimited version if you can recompile it with
Chris@4 665 <code class="computeroutput">MaybeUInt64</code> set to be an
Chris@4 666 unsigned 64-bit integer.</p>
Chris@4 667 </div>
Chris@4 668 <div class="sect1" title="2.9. AUTHOR">
Chris@4 669 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
Chris@4 670 <a name="author"></a>2.9. AUTHOR</h2></div></div></div>
Chris@4 671 <p>Julian Seward,
Chris@4 672 <code class="computeroutput">jseward@bzip.org</code></p>
Chris@4 673 <p>The ideas embodied in
Chris@4 674 <code class="computeroutput">bzip2</code> are due to (at least) the
Chris@4 675 following people: Michael Burrows and David Wheeler (for the
Chris@4 676 block sorting transformation), David Wheeler (again, for the
Chris@4 677 Huffman coder), Peter Fenwick (for the structured coding model in
Chris@4 678 the original <code class="computeroutput">bzip</code>, and many
Chris@4 679 refinements), and Alistair Moffat, Radford Neal and Ian Witten
Chris@4 680 (for the arithmetic coder in the original
Chris@4 681 <code class="computeroutput">bzip</code>). I am much indebted for
Chris@4 682 their help, support and advice. See the manual in the source
Chris@4 683 distribution for pointers to sources of documentation. Christian
Chris@4 684 von Roques encouraged me to look for faster sorting algorithms,
Chris@4 685 so as to speed up compression. Bela Lubkin encouraged me to
Chris@4 686 improve the worst-case compression performance.
Chris@4 687 Donna Robinson XMLised the documentation.
Chris@4 688 Many people sent
Chris@4 689 patches, helped with portability problems, lent machines, gave
Chris@4 690 advice and were generally helpful.</p>
Chris@4 691 </div>
Chris@4 692 </div>
Chris@4 693 <div class="chapter" title="3.  Programming with libbzip2">
Chris@4 694 <div class="titlepage"><div><div><h2 class="title">
Chris@4 695 <a name="libprog"></a>3. 
Chris@4 696 Programming with <code class="computeroutput">libbzip2</code>
Chris@4 697 </h2></div></div></div>
Chris@4 698 <div class="toc">
Chris@4 699 <p><b>Table of Contents</b></p>
Chris@4 700 <dl>
Chris@4 701 <dt><span class="sect1"><a href="#top-level">3.1. Top-level structure</a></span></dt>
Chris@4 702 <dd><dl>
Chris@4 703 <dt><span class="sect2"><a href="#ll-summary">3.1.1. Low-level summary</a></span></dt>
Chris@4 704 <dt><span class="sect2"><a href="#hl-summary">3.1.2. High-level summary</a></span></dt>
Chris@4 705 <dt><span class="sect2"><a href="#util-fns-summary">3.1.3. Utility functions summary</a></span></dt>
Chris@4 706 </dl></dd>
Chris@4 707 <dt><span class="sect1"><a href="#err-handling">3.2. Error handling</a></span></dt>
Chris@4 708 <dt><span class="sect1"><a href="#low-level">3.3. Low-level interface</a></span></dt>
Chris@4 709 <dd><dl>
Chris@4 710 <dt><span class="sect2"><a href="#bzcompress-init">3.3.1. BZ2_bzCompressInit</a></span></dt>
Chris@4 711 <dt><span class="sect2"><a href="#bzCompress">3.3.2. BZ2_bzCompress</a></span></dt>
Chris@4 712 <dt><span class="sect2"><a href="#bzCompress-end">3.3.3. BZ2_bzCompressEnd</a></span></dt>
Chris@4 713 <dt><span class="sect2"><a href="#bzDecompress-init">3.3.4. BZ2_bzDecompressInit</a></span></dt>
Chris@4 714 <dt><span class="sect2"><a href="#bzDecompress">3.3.5. BZ2_bzDecompress</a></span></dt>
Chris@4 715 <dt><span class="sect2"><a href="#bzDecompress-end">3.3.6. BZ2_bzDecompressEnd</a></span></dt>
Chris@4 716 </dl></dd>
Chris@4 717 <dt><span class="sect1"><a href="#hl-interface">3.4. High-level interface</a></span></dt>
Chris@4 718 <dd><dl>
Chris@4 719 <dt><span class="sect2"><a href="#bzreadopen">3.4.1. BZ2_bzReadOpen</a></span></dt>
Chris@4 720 <dt><span class="sect2"><a href="#bzread">3.4.2. BZ2_bzRead</a></span></dt>
Chris@4 721 <dt><span class="sect2"><a href="#bzreadgetunused">3.4.3. BZ2_bzReadGetUnused</a></span></dt>
Chris@4 722 <dt><span class="sect2"><a href="#bzreadclose">3.4.4. BZ2_bzReadClose</a></span></dt>
Chris@4 723 <dt><span class="sect2"><a href="#bzwriteopen">3.4.5. BZ2_bzWriteOpen</a></span></dt>
Chris@4 724 <dt><span class="sect2"><a href="#bzwrite">3.4.6. BZ2_bzWrite</a></span></dt>
Chris@4 725 <dt><span class="sect2"><a href="#bzwriteclose">3.4.7. BZ2_bzWriteClose</a></span></dt>
Chris@4 726 <dt><span class="sect2"><a href="#embed">3.4.8. Handling embedded compressed data streams</a></span></dt>
Chris@4 727 <dt><span class="sect2"><a href="#std-rdwr">3.4.9. Standard file-reading/writing code</a></span></dt>
Chris@4 728 </dl></dd>
Chris@4 729 <dt><span class="sect1"><a href="#util-fns">3.5. Utility functions</a></span></dt>
Chris@4 730 <dd><dl>
Chris@4 731 <dt><span class="sect2"><a href="#bzbufftobuffcompress">3.5.1. BZ2_bzBuffToBuffCompress</a></span></dt>
Chris@4 732 <dt><span class="sect2"><a href="#bzbufftobuffdecompress">3.5.2. BZ2_bzBuffToBuffDecompress</a></span></dt>
Chris@4 733 </dl></dd>
Chris@4 734 <dt><span class="sect1"><a href="#zlib-compat">3.6. zlib compatibility functions</a></span></dt>
Chris@4 735 <dt><span class="sect1"><a href="#stdio-free">3.7. Using the library in a stdio-free environment</a></span></dt>
Chris@4 736 <dd><dl>
Chris@4 737 <dt><span class="sect2"><a href="#stdio-bye">3.7.1. Getting rid of stdio</a></span></dt>
Chris@4 738 <dt><span class="sect2"><a href="#critical-error">3.7.2. Critical error handling</a></span></dt>
Chris@4 739 </dl></dd>
Chris@4 740 <dt><span class="sect1"><a href="#win-dll">3.8. Making a Windows DLL</a></span></dt>
Chris@4 741 </dl>
Chris@4 742 </div>
Chris@4 743 <p>This chapter describes the programming interface to
Chris@4 744 <code class="computeroutput">libbzip2</code>.</p>
Chris@4 745 <p>For general background information, particularly about
Chris@4 746 memory use and performance aspects, you'd be well advised to read
Chris@4 747 <a class="xref" href="#using" title="2. How to use bzip2">How to use bzip2</a> as well.</p>
Chris@4 748 <div class="sect1" title="3.1. Top-level structure">
Chris@4 749 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
Chris@4 750 <a name="top-level"></a>3.1. Top-level structure</h2></div></div></div>
Chris@4 751 <p><code class="computeroutput">libbzip2</code> is a flexible
Chris@4 752 library for compressing and decompressing data in the
Chris@4 753 <code class="computeroutput">bzip2</code> data format. Although
Chris@4 754 packaged as a single entity, it helps to regard the library as
Chris@4 755 three separate parts: the low level interface, and the high level
Chris@4 756 interface, and some utility functions.</p>
Chris@4 757 <p>The structure of
Chris@4 758 <code class="computeroutput">libbzip2</code>'s interfaces is similar
Chris@4 759 to that of Jean-loup Gailly's and Mark Adler's excellent
Chris@4 760 <code class="computeroutput">zlib</code> library.</p>
Chris@4 761 <p>All externally visible symbols have names beginning
Chris@4 762 <code class="computeroutput">BZ2_</code>. This is new in version
Chris@4 763 1.0. The intention is to minimise pollution of the namespaces of
Chris@4 764 library clients.</p>
Chris@4 765 <p>To use any part of the library, you need to
Chris@4 766 <code class="computeroutput">#include &lt;bzlib.h&gt;</code>
Chris@4 767 into your sources.</p>
Chris@4 768 <div class="sect2" title="3.1.1. Low-level summary">
Chris@4 769 <div class="titlepage"><div><div><h3 class="title">
Chris@4 770 <a name="ll-summary"></a>3.1.1. Low-level summary</h3></div></div></div>
Chris@4 771 <p>This interface provides services for compressing and
Chris@4 772 decompressing data in memory. There's no provision for dealing
Chris@4 773 with files, streams or any other I/O mechanisms, just straight
Chris@4 774 memory-to-memory work. In fact, this part of the library can be
Chris@4 775 compiled without inclusion of
Chris@4 776 <code class="computeroutput">stdio.h</code>, which may be helpful
Chris@4 777 for embedded applications.</p>
Chris@4 778 <p>The low-level part of the library has no global variables
Chris@4 779 and is therefore thread-safe.</p>
Chris@4 780 <p>Six routines make up the low level interface:
Chris@4 781 <code class="computeroutput">BZ2_bzCompressInit</code>,
Chris@4 782 <code class="computeroutput">BZ2_bzCompress</code>, and
Chris@4 783 <code class="computeroutput">BZ2_bzCompressEnd</code> for
Chris@4 784 compression, and a corresponding trio
Chris@4 785 <code class="computeroutput">BZ2_bzDecompressInit</code>,
Chris@4 786 <code class="computeroutput">BZ2_bzDecompress</code> and
Chris@4 787 <code class="computeroutput">BZ2_bzDecompressEnd</code> for
Chris@4 788 decompression. The <code class="computeroutput">*Init</code>
Chris@4 789 functions allocate memory for compression/decompression and do
Chris@4 790 other initialisations, whilst the
Chris@4 791 <code class="computeroutput">*End</code> functions close down
Chris@4 792 operations and release memory.</p>
Chris@4 793 <p>The real work is done by
Chris@4 794 <code class="computeroutput">BZ2_bzCompress</code> and
Chris@4 795 <code class="computeroutput">BZ2_bzDecompress</code>. These
Chris@4 796 compress and decompress data from a user-supplied input buffer to
Chris@4 797 a user-supplied output buffer. These buffers can be any size;
Chris@4 798 arbitrary quantities of data are handled by making repeated calls
Chris@4 799 to these functions. This is a flexible mechanism allowing a
Chris@4 800 consumer-pull style of activity, or producer-push, or a mixture
Chris@4 801 of both.</p>
Chris@4 802 </div>
Chris@4 803 <div class="sect2" title="3.1.2. High-level summary">
Chris@4 804 <div class="titlepage"><div><div><h3 class="title">
Chris@4 805 <a name="hl-summary"></a>3.1.2. High-level summary</h3></div></div></div>
Chris@4 806 <p>This interface provides some handy wrappers around the
Chris@4 807 low-level interface to facilitate reading and writing
Chris@4 808 <code class="computeroutput">bzip2</code> format files
Chris@4 809 (<code class="computeroutput">.bz2</code> files). The routines
Chris@4 810 provide hooks to facilitate reading files in which the
Chris@4 811 <code class="computeroutput">bzip2</code> data stream is embedded
Chris@4 812 within some larger-scale file structure, or where there are
Chris@4 813 multiple <code class="computeroutput">bzip2</code> data streams
Chris@4 814 concatenated end-to-end.</p>
Chris@4 815 <p>For reading files,
Chris@4 816 <code class="computeroutput">BZ2_bzReadOpen</code>,
Chris@4 817 <code class="computeroutput">BZ2_bzRead</code>,
Chris@4 818 <code class="computeroutput">BZ2_bzReadClose</code> and
Chris@4 819 <code class="computeroutput">BZ2_bzReadGetUnused</code> are
Chris@4 820 supplied. For writing files,
Chris@4 821 <code class="computeroutput">BZ2_bzWriteOpen</code>,
Chris@4 822 <code class="computeroutput">BZ2_bzWrite</code> and
Chris@4 823 <code class="computeroutput">BZ2_bzWriteFinish</code> are
Chris@4 824 available.</p>
Chris@4 825 <p>As with the low-level library, no global variables are used
Chris@4 826 so the library is per se thread-safe. However, if I/O errors
Chris@4 827 occur whilst reading or writing the underlying compressed files,
Chris@4 828 you may have to consult <code class="computeroutput">errno</code> to
Chris@4 829 determine the cause of the error. In that case, you'd need a C
Chris@4 830 library which correctly supports
Chris@4 831 <code class="computeroutput">errno</code> in a multithreaded
Chris@4 832 environment.</p>
Chris@4 833 <p>To make the library a little simpler and more portable,
Chris@4 834 <code class="computeroutput">BZ2_bzReadOpen</code> and
Chris@4 835 <code class="computeroutput">BZ2_bzWriteOpen</code> require you to
Chris@4 836 pass them file handles (<code class="computeroutput">FILE*</code>s)
Chris@4 837 which have previously been opened for reading or writing
Chris@4 838 respectively. That avoids portability problems associated with
Chris@4 839 file operations and file attributes, whilst not being much of an
Chris@4 840 imposition on the programmer.</p>
Chris@4 841 </div>
Chris@4 842 <div class="sect2" title="3.1.3. Utility functions summary">
Chris@4 843 <div class="titlepage"><div><div><h3 class="title">
Chris@4 844 <a name="util-fns-summary"></a>3.1.3. Utility functions summary</h3></div></div></div>
Chris@4 845 <p>For very simple needs,
Chris@4 846 <code class="computeroutput">BZ2_bzBuffToBuffCompress</code> and
Chris@4 847 <code class="computeroutput">BZ2_bzBuffToBuffDecompress</code> are
Chris@4 848 provided. These compress data in memory from one buffer to
Chris@4 849 another buffer in a single function call. You should assess
Chris@4 850 whether these functions fulfill your memory-to-memory
Chris@4 851 compression/decompression requirements before investing effort in
Chris@4 852 understanding the more general but more complex low-level
Chris@4 853 interface.</p>
Chris@4 854 <p>Yoshioka Tsuneo
Chris@4 855 (<code class="computeroutput">tsuneo@rr.iij4u.or.jp</code>) has
Chris@4 856 contributed some functions to give better
Chris@4 857 <code class="computeroutput">zlib</code> compatibility. These
Chris@4 858 functions are <code class="computeroutput">BZ2_bzopen</code>,
Chris@4 859 <code class="computeroutput">BZ2_bzread</code>,
Chris@4 860 <code class="computeroutput">BZ2_bzwrite</code>,
Chris@4 861 <code class="computeroutput">BZ2_bzflush</code>,
Chris@4 862 <code class="computeroutput">BZ2_bzclose</code>,
Chris@4 863 <code class="computeroutput">BZ2_bzerror</code> and
Chris@4 864 <code class="computeroutput">BZ2_bzlibVersion</code>. You may find
Chris@4 865 these functions more convenient for simple file reading and
Chris@4 866 writing, than those in the high-level interface. These functions
Chris@4 867 are not (yet) officially part of the library, and are minimally
Chris@4 868 documented here. If they break, you get to keep all the pieces.
Chris@4 869 I hope to document them properly when time permits.</p>
Chris@4 870 <p>Yoshioka also contributed modifications to allow the
Chris@4 871 library to be built as a Windows DLL.</p>
Chris@4 872 </div>
Chris@4 873 </div>
Chris@4 874 <div class="sect1" title="3.2. Error handling">
Chris@4 875 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
Chris@4 876 <a name="err-handling"></a>3.2. Error handling</h2></div></div></div>
Chris@4 877 <p>The library is designed to recover cleanly in all
Chris@4 878 situations, including the worst-case situation of decompressing
Chris@4 879 random data. I'm not 100% sure that it can always do this, so
Chris@4 880 you might want to add a signal handler to catch segmentation
Chris@4 881 violations during decompression if you are feeling especially
Chris@4 882 paranoid. I would be interested in hearing more about the
Chris@4 883 robustness of the library to corrupted compressed data.</p>
Chris@4 884 <p>Version 1.0.3 more robust in this respect than any
Chris@4 885 previous version. Investigations with Valgrind (a tool for detecting
Chris@4 886 problems with memory management) indicate
Chris@4 887 that, at least for the few files I tested, all single-bit errors
Chris@4 888 in the decompressed data are caught properly, with no
Chris@4 889 segmentation faults, no uses of uninitialised data, no out of
Chris@4 890 range reads or writes, and no infinite looping in the decompressor.
Chris@4 891 So it's certainly pretty robust, although
Chris@4 892 I wouldn't claim it to be totally bombproof.</p>
Chris@4 893 <p>The file <code class="computeroutput">bzlib.h</code> contains
Chris@4 894 all definitions needed to use the library. In particular, you
Chris@4 895 should definitely not include
Chris@4 896 <code class="computeroutput">bzlib_private.h</code>.</p>
Chris@4 897 <p>In <code class="computeroutput">bzlib.h</code>, the various
Chris@4 898 return values are defined. The following list is not intended as
Chris@4 899 an exhaustive description of the circumstances in which a given
Chris@4 900 value may be returned -- those descriptions are given later.
Chris@4 901 Rather, it is intended to convey the rough meaning of each return
Chris@4 902 value. The first five actions are normal and not intended to
Chris@4 903 denote an error situation.</p>
Chris@4 904 <div class="variablelist"><dl>
Chris@4 905 <dt><span class="term"><code class="computeroutput">BZ_OK</code></span></dt>
Chris@4 906 <dd><p>The requested action was completed
Chris@4 907 successfully.</p></dd>
Chris@4 908 <dt><span class="term"><code class="computeroutput">BZ_RUN_OK, BZ_FLUSH_OK,
Chris@4 909 BZ_FINISH_OK</code></span></dt>
Chris@4 910 <dd><p>In
Chris@4 911 <code class="computeroutput">BZ2_bzCompress</code>, the requested
Chris@4 912 flush/finish/nothing-special action was completed
Chris@4 913 successfully.</p></dd>
Chris@4 914 <dt><span class="term"><code class="computeroutput">BZ_STREAM_END</code></span></dt>
Chris@4 915 <dd><p>Compression of data was completed, or the
Chris@4 916 logical stream end was detected during
Chris@4 917 decompression.</p></dd>
Chris@4 918 </dl></div>
Chris@4 919 <p>The following return values indicate an error of some
Chris@4 920 kind.</p>
Chris@4 921 <div class="variablelist"><dl>
Chris@4 922 <dt><span class="term"><code class="computeroutput">BZ_CONFIG_ERROR</code></span></dt>
Chris@4 923 <dd><p>Indicates that the library has been improperly
Chris@4 924 compiled on your platform -- a major configuration error.
Chris@4 925 Specifically, it means that
Chris@4 926 <code class="computeroutput">sizeof(char)</code>,
Chris@4 927 <code class="computeroutput">sizeof(short)</code> and
Chris@4 928 <code class="computeroutput">sizeof(int)</code> are not 1, 2 and
Chris@4 929 4 respectively, as they should be. Note that the library
Chris@4 930 should still work properly on 64-bit platforms which follow
Chris@4 931 the LP64 programming model -- that is, where
Chris@4 932 <code class="computeroutput">sizeof(long)</code> and
Chris@4 933 <code class="computeroutput">sizeof(void*)</code> are 8. Under
Chris@4 934 LP64, <code class="computeroutput">sizeof(int)</code> is still 4,
Chris@4 935 so <code class="computeroutput">libbzip2</code>, which doesn't
Chris@4 936 use the <code class="computeroutput">long</code> type, is
Chris@4 937 OK.</p></dd>
Chris@4 938 <dt><span class="term"><code class="computeroutput">BZ_SEQUENCE_ERROR</code></span></dt>
Chris@4 939 <dd><p>When using the library, it is important to call
Chris@4 940 the functions in the correct sequence and with data structures
Chris@4 941 (buffers etc) in the correct states.
Chris@4 942 <code class="computeroutput">libbzip2</code> checks as much as it
Chris@4 943 can to ensure this is happening, and returns
Chris@4 944 <code class="computeroutput">BZ_SEQUENCE_ERROR</code> if not.
Chris@4 945 Code which complies precisely with the function semantics, as
Chris@4 946 detailed below, should never receive this value; such an event
Chris@4 947 denotes buggy code which you should
Chris@4 948 investigate.</p></dd>
Chris@4 949 <dt><span class="term"><code class="computeroutput">BZ_PARAM_ERROR</code></span></dt>
Chris@4 950 <dd><p>Returned when a parameter to a function call is
Chris@4 951 out of range or otherwise manifestly incorrect. As with
Chris@4 952 <code class="computeroutput">BZ_SEQUENCE_ERROR</code>, this
Chris@4 953 denotes a bug in the client code. The distinction between
Chris@4 954 <code class="computeroutput">BZ_PARAM_ERROR</code> and
Chris@4 955 <code class="computeroutput">BZ_SEQUENCE_ERROR</code> is a bit
Chris@4 956 hazy, but still worth making.</p></dd>
Chris@4 957 <dt><span class="term"><code class="computeroutput">BZ_MEM_ERROR</code></span></dt>
Chris@4 958 <dd><p>Returned when a request to allocate memory
Chris@4 959 failed. Note that the quantity of memory needed to decompress
Chris@4 960 a stream cannot be determined until the stream's header has
Chris@4 961 been read. So
Chris@4 962 <code class="computeroutput">BZ2_bzDecompress</code> and
Chris@4 963 <code class="computeroutput">BZ2_bzRead</code> may return
Chris@4 964 <code class="computeroutput">BZ_MEM_ERROR</code> even though some
Chris@4 965 of the compressed data has been read. The same is not true
Chris@4 966 for compression; once
Chris@4 967 <code class="computeroutput">BZ2_bzCompressInit</code> or
Chris@4 968 <code class="computeroutput">BZ2_bzWriteOpen</code> have
Chris@4 969 successfully completed,
Chris@4 970 <code class="computeroutput">BZ_MEM_ERROR</code> cannot
Chris@4 971 occur.</p></dd>
Chris@4 972 <dt><span class="term"><code class="computeroutput">BZ_DATA_ERROR</code></span></dt>
Chris@4 973 <dd><p>Returned when a data integrity error is
Chris@4 974 detected during decompression. Most importantly, this means
Chris@4 975 when stored and computed CRCs for the data do not match. This
Chris@4 976 value is also returned upon detection of any other anomaly in
Chris@4 977 the compressed data.</p></dd>
Chris@4 978 <dt><span class="term"><code class="computeroutput">BZ_DATA_ERROR_MAGIC</code></span></dt>
Chris@4 979 <dd><p>As a special case of
Chris@4 980 <code class="computeroutput">BZ_DATA_ERROR</code>, it is
Chris@4 981 sometimes useful to know when the compressed stream does not
Chris@4 982 start with the correct magic bytes (<code class="computeroutput">'B' 'Z'
Chris@4 983 'h'</code>).</p></dd>
Chris@4 984 <dt><span class="term"><code class="computeroutput">BZ_IO_ERROR</code></span></dt>
Chris@4 985 <dd><p>Returned by
Chris@4 986 <code class="computeroutput">BZ2_bzRead</code> and
Chris@4 987 <code class="computeroutput">BZ2_bzWrite</code> when there is an
Chris@4 988 error reading or writing in the compressed file, and by
Chris@4 989 <code class="computeroutput">BZ2_bzReadOpen</code> and
Chris@4 990 <code class="computeroutput">BZ2_bzWriteOpen</code> for attempts
Chris@4 991 to use a file for which the error indicator (viz,
Chris@4 992 <code class="computeroutput">ferror(f)</code>) is set. On
Chris@4 993 receipt of <code class="computeroutput">BZ_IO_ERROR</code>, the
Chris@4 994 caller should consult <code class="computeroutput">errno</code>
Chris@4 995 and/or <code class="computeroutput">perror</code> to acquire
Chris@4 996 operating-system specific information about the
Chris@4 997 problem.</p></dd>
Chris@4 998 <dt><span class="term"><code class="computeroutput">BZ_UNEXPECTED_EOF</code></span></dt>
Chris@4 999 <dd><p>Returned by
Chris@4 1000 <code class="computeroutput">BZ2_bzRead</code> when the
Chris@4 1001 compressed file finishes before the logical end of stream is
Chris@4 1002 detected.</p></dd>
Chris@4 1003 <dt><span class="term"><code class="computeroutput">BZ_OUTBUFF_FULL</code></span></dt>
Chris@4 1004 <dd><p>Returned by
Chris@4 1005 <code class="computeroutput">BZ2_bzBuffToBuffCompress</code> and
Chris@4 1006 <code class="computeroutput">BZ2_bzBuffToBuffDecompress</code> to
Chris@4 1007 indicate that the output data will not fit into the output
Chris@4 1008 buffer provided.</p></dd>
Chris@4 1009 </dl></div>
Chris@4 1010 </div>
Chris@4 1011 <div class="sect1" title="3.3. Low-level interface">
Chris@4 1012 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
Chris@4 1013 <a name="low-level"></a>3.3. Low-level interface</h2></div></div></div>
Chris@4 1014 <div class="sect2" title="3.3.1. BZ2_bzCompressInit">
Chris@4 1015 <div class="titlepage"><div><div><h3 class="title">
Chris@4 1016 <a name="bzcompress-init"></a>3.3.1. BZ2_bzCompressInit</h3></div></div></div>
Chris@4 1017 <pre class="programlisting">typedef struct {
Chris@4 1018 char *next_in;
Chris@4 1019 unsigned int avail_in;
Chris@4 1020 unsigned int total_in_lo32;
Chris@4 1021 unsigned int total_in_hi32;
Chris@4 1022
Chris@4 1023 char *next_out;
Chris@4 1024 unsigned int avail_out;
Chris@4 1025 unsigned int total_out_lo32;
Chris@4 1026 unsigned int total_out_hi32;
Chris@4 1027
Chris@4 1028 void *state;
Chris@4 1029
Chris@4 1030 void *(*bzalloc)(void *,int,int);
Chris@4 1031 void (*bzfree)(void *,void *);
Chris@4 1032 void *opaque;
Chris@4 1033 } bz_stream;
Chris@4 1034
Chris@4 1035 int BZ2_bzCompressInit ( bz_stream *strm,
Chris@4 1036 int blockSize100k,
Chris@4 1037 int verbosity,
Chris@4 1038 int workFactor );</pre>
Chris@4 1039 <p>Prepares for compression. The
Chris@4 1040 <code class="computeroutput">bz_stream</code> structure holds all
Chris@4 1041 data pertaining to the compression activity. A
Chris@4 1042 <code class="computeroutput">bz_stream</code> structure should be
Chris@4 1043 allocated and initialised prior to the call. The fields of
Chris@4 1044 <code class="computeroutput">bz_stream</code> comprise the entirety
Chris@4 1045 of the user-visible data. <code class="computeroutput">state</code>
Chris@4 1046 is a pointer to the private data structures required for
Chris@4 1047 compression.</p>
Chris@4 1048 <p>Custom memory allocators are supported, via fields
Chris@4 1049 <code class="computeroutput">bzalloc</code>,
Chris@4 1050 <code class="computeroutput">bzfree</code>, and
Chris@4 1051 <code class="computeroutput">opaque</code>. The value
Chris@4 1052 <code class="computeroutput">opaque</code> is passed to as the first
Chris@4 1053 argument to all calls to <code class="computeroutput">bzalloc</code>
Chris@4 1054 and <code class="computeroutput">bzfree</code>, but is otherwise
Chris@4 1055 ignored by the library. The call <code class="computeroutput">bzalloc (
Chris@4 1056 opaque, n, m )</code> is expected to return a pointer
Chris@4 1057 <code class="computeroutput">p</code> to <code class="computeroutput">n *
Chris@4 1058 m</code> bytes of memory, and <code class="computeroutput">bzfree (
Chris@4 1059 opaque, p )</code> should free that memory.</p>
Chris@4 1060 <p>If you don't want to use a custom memory allocator, set
Chris@4 1061 <code class="computeroutput">bzalloc</code>,
Chris@4 1062 <code class="computeroutput">bzfree</code> and
Chris@4 1063 <code class="computeroutput">opaque</code> to
Chris@4 1064 <code class="computeroutput">NULL</code>, and the library will then
Chris@4 1065 use the standard <code class="computeroutput">malloc</code> /
Chris@4 1066 <code class="computeroutput">free</code> routines.</p>
Chris@4 1067 <p>Before calling
Chris@4 1068 <code class="computeroutput">BZ2_bzCompressInit</code>, fields
Chris@4 1069 <code class="computeroutput">bzalloc</code>,
Chris@4 1070 <code class="computeroutput">bzfree</code> and
Chris@4 1071 <code class="computeroutput">opaque</code> should be filled
Chris@4 1072 appropriately, as just described. Upon return, the internal
Chris@4 1073 state will have been allocated and initialised, and
Chris@4 1074 <code class="computeroutput">total_in_lo32</code>,
Chris@4 1075 <code class="computeroutput">total_in_hi32</code>,
Chris@4 1076 <code class="computeroutput">total_out_lo32</code> and
Chris@4 1077 <code class="computeroutput">total_out_hi32</code> will have been
Chris@4 1078 set to zero. These four fields are used by the library to inform
Chris@4 1079 the caller of the total amount of data passed into and out of the
Chris@4 1080 library, respectively. You should not try to change them. As of
Chris@4 1081 version 1.0, 64-bit counts are maintained, even on 32-bit
Chris@4 1082 platforms, using the <code class="computeroutput">_hi32</code>
Chris@4 1083 fields to store the upper 32 bits of the count. So, for example,
Chris@4 1084 the total amount of data in is <code class="computeroutput">(total_in_hi32
Chris@4 1085 &lt;&lt; 32) + total_in_lo32</code>.</p>
Chris@4 1086 <p>Parameter <code class="computeroutput">blockSize100k</code>
Chris@4 1087 specifies the block size to be used for compression. It should
Chris@4 1088 be a value between 1 and 9 inclusive, and the actual block size
Chris@4 1089 used is 100000 x this figure. 9 gives the best compression but
Chris@4 1090 takes most memory.</p>
Chris@4 1091 <p>Parameter <code class="computeroutput">verbosity</code> should
Chris@4 1092 be set to a number between 0 and 4 inclusive. 0 is silent, and
Chris@4 1093 greater numbers give increasingly verbose monitoring/debugging
Chris@4 1094 output. If the library has been compiled with
Chris@4 1095 <code class="computeroutput">-DBZ_NO_STDIO</code>, no such output
Chris@4 1096 will appear for any verbosity setting.</p>
Chris@4 1097 <p>Parameter <code class="computeroutput">workFactor</code>
Chris@4 1098 controls how the compression phase behaves when presented with
Chris@4 1099 worst case, highly repetitive, input data. If compression runs
Chris@4 1100 into difficulties caused by repetitive data, the library switches
Chris@4 1101 from the standard sorting algorithm to a fallback algorithm. The
Chris@4 1102 fallback is slower than the standard algorithm by perhaps a
Chris@4 1103 factor of three, but always behaves reasonably, no matter how bad
Chris@4 1104 the input.</p>
Chris@4 1105 <p>Lower values of <code class="computeroutput">workFactor</code>
Chris@4 1106 reduce the amount of effort the standard algorithm will expend
Chris@4 1107 before resorting to the fallback. You should set this parameter
Chris@4 1108 carefully; too low, and many inputs will be handled by the
Chris@4 1109 fallback algorithm and so compress rather slowly, too high, and
Chris@4 1110 your average-to-worst case compression times can become very
Chris@4 1111 large. The default value of 30 gives reasonable behaviour over a
Chris@4 1112 wide range of circumstances.</p>
Chris@4 1113 <p>Allowable values range from 0 to 250 inclusive. 0 is a
Chris@4 1114 special case, equivalent to using the default value of 30.</p>
Chris@4 1115 <p>Note that the compressed output generated is the same
Chris@4 1116 regardless of whether or not the fallback algorithm is
Chris@4 1117 used.</p>
Chris@4 1118 <p>Be aware also that this parameter may disappear entirely in
Chris@4 1119 future versions of the library. In principle it should be
Chris@4 1120 possible to devise a good way to automatically choose which
Chris@4 1121 algorithm to use. Such a mechanism would render the parameter
Chris@4 1122 obsolete.</p>
Chris@4 1123 <p>Possible return values:</p>
Chris@4 1124 <pre class="programlisting">BZ_CONFIG_ERROR
Chris@4 1125 if the library has been mis-compiled
Chris@4 1126 BZ_PARAM_ERROR
Chris@4 1127 if strm is NULL
Chris@4 1128 or blockSize &lt; 1 or blockSize &gt; 9
Chris@4 1129 or verbosity &lt; 0 or verbosity &gt; 4
Chris@4 1130 or workFactor &lt; 0 or workFactor &gt; 250
Chris@4 1131 BZ_MEM_ERROR
Chris@4 1132 if not enough memory is available
Chris@4 1133 BZ_OK
Chris@4 1134 otherwise</pre>
Chris@4 1135 <p>Allowable next actions:</p>
Chris@4 1136 <pre class="programlisting">BZ2_bzCompress
Chris@4 1137 if BZ_OK is returned
Chris@4 1138 no specific action needed in case of error</pre>
Chris@4 1139 </div>
Chris@4 1140 <div class="sect2" title="3.3.2. BZ2_bzCompress">
Chris@4 1141 <div class="titlepage"><div><div><h3 class="title">
Chris@4 1142 <a name="bzCompress"></a>3.3.2. BZ2_bzCompress</h3></div></div></div>
Chris@4 1143 <pre class="programlisting">int BZ2_bzCompress ( bz_stream *strm, int action );</pre>
Chris@4 1144 <p>Provides more input and/or output buffer space for the
Chris@4 1145 library. The caller maintains input and output buffers, and
Chris@4 1146 calls <code class="computeroutput">BZ2_bzCompress</code> to transfer
Chris@4 1147 data between them.</p>
Chris@4 1148 <p>Before each call to
Chris@4 1149 <code class="computeroutput">BZ2_bzCompress</code>,
Chris@4 1150 <code class="computeroutput">next_in</code> should point at the data
Chris@4 1151 to be compressed, and <code class="computeroutput">avail_in</code>
Chris@4 1152 should indicate how many bytes the library may read.
Chris@4 1153 <code class="computeroutput">BZ2_bzCompress</code> updates
Chris@4 1154 <code class="computeroutput">next_in</code>,
Chris@4 1155 <code class="computeroutput">avail_in</code> and
Chris@4 1156 <code class="computeroutput">total_in</code> to reflect the number
Chris@4 1157 of bytes it has read.</p>
Chris@4 1158 <p>Similarly, <code class="computeroutput">next_out</code> should
Chris@4 1159 point to a buffer in which the compressed data is to be placed,
Chris@4 1160 with <code class="computeroutput">avail_out</code> indicating how
Chris@4 1161 much output space is available.
Chris@4 1162 <code class="computeroutput">BZ2_bzCompress</code> updates
Chris@4 1163 <code class="computeroutput">next_out</code>,
Chris@4 1164 <code class="computeroutput">avail_out</code> and
Chris@4 1165 <code class="computeroutput">total_out</code> to reflect the number
Chris@4 1166 of bytes output.</p>
Chris@4 1167 <p>You may provide and remove as little or as much data as you
Chris@4 1168 like on each call of
Chris@4 1169 <code class="computeroutput">BZ2_bzCompress</code>. In the limit,
Chris@4 1170 it is acceptable to supply and remove data one byte at a time,
Chris@4 1171 although this would be terribly inefficient. You should always
Chris@4 1172 ensure that at least one byte of output space is available at
Chris@4 1173 each call.</p>
Chris@4 1174 <p>A second purpose of
Chris@4 1175 <code class="computeroutput">BZ2_bzCompress</code> is to request a
Chris@4 1176 change of mode of the compressed stream.</p>
Chris@4 1177 <p>Conceptually, a compressed stream can be in one of four
Chris@4 1178 states: IDLE, RUNNING, FLUSHING and FINISHING. Before
Chris@4 1179 initialisation
Chris@4 1180 (<code class="computeroutput">BZ2_bzCompressInit</code>) and after
Chris@4 1181 termination (<code class="computeroutput">BZ2_bzCompressEnd</code>),
Chris@4 1182 a stream is regarded as IDLE.</p>
Chris@4 1183 <p>Upon initialisation
Chris@4 1184 (<code class="computeroutput">BZ2_bzCompressInit</code>), the stream
Chris@4 1185 is placed in the RUNNING state. Subsequent calls to
Chris@4 1186 <code class="computeroutput">BZ2_bzCompress</code> should pass
Chris@4 1187 <code class="computeroutput">BZ_RUN</code> as the requested action;
Chris@4 1188 other actions are illegal and will result in
Chris@4 1189 <code class="computeroutput">BZ_SEQUENCE_ERROR</code>.</p>
Chris@4 1190 <p>At some point, the calling program will have provided all
Chris@4 1191 the input data it wants to. It will then want to finish up -- in
Chris@4 1192 effect, asking the library to process any data it might have
Chris@4 1193 buffered internally. In this state,
Chris@4 1194 <code class="computeroutput">BZ2_bzCompress</code> will no longer
Chris@4 1195 attempt to read data from
Chris@4 1196 <code class="computeroutput">next_in</code>, but it will want to
Chris@4 1197 write data to <code class="computeroutput">next_out</code>. Because
Chris@4 1198 the output buffer supplied by the user can be arbitrarily small,
Chris@4 1199 the finishing-up operation cannot necessarily be done with a
Chris@4 1200 single call of
Chris@4 1201 <code class="computeroutput">BZ2_bzCompress</code>.</p>
Chris@4 1202 <p>Instead, the calling program passes
Chris@4 1203 <code class="computeroutput">BZ_FINISH</code> as an action to
Chris@4 1204 <code class="computeroutput">BZ2_bzCompress</code>. This changes
Chris@4 1205 the stream's state to FINISHING. Any remaining input (ie,
Chris@4 1206 <code class="computeroutput">next_in[0 .. avail_in-1]</code>) is
Chris@4 1207 compressed and transferred to the output buffer. To do this,
Chris@4 1208 <code class="computeroutput">BZ2_bzCompress</code> must be called
Chris@4 1209 repeatedly until all the output has been consumed. At that
Chris@4 1210 point, <code class="computeroutput">BZ2_bzCompress</code> returns
Chris@4 1211 <code class="computeroutput">BZ_STREAM_END</code>, and the stream's
Chris@4 1212 state is set back to IDLE.
Chris@4 1213 <code class="computeroutput">BZ2_bzCompressEnd</code> should then be
Chris@4 1214 called.</p>
Chris@4 1215 <p>Just to make sure the calling program does not cheat, the
Chris@4 1216 library makes a note of <code class="computeroutput">avail_in</code>
Chris@4 1217 at the time of the first call to
Chris@4 1218 <code class="computeroutput">BZ2_bzCompress</code> which has
Chris@4 1219 <code class="computeroutput">BZ_FINISH</code> as an action (ie, at
Chris@4 1220 the time the program has announced its intention to not supply
Chris@4 1221 any more input). By comparing this value with that of
Chris@4 1222 <code class="computeroutput">avail_in</code> over subsequent calls
Chris@4 1223 to <code class="computeroutput">BZ2_bzCompress</code>, the library
Chris@4 1224 can detect any attempts to slip in more data to compress. Any
Chris@4 1225 calls for which this is detected will return
Chris@4 1226 <code class="computeroutput">BZ_SEQUENCE_ERROR</code>. This
Chris@4 1227 indicates a programming mistake which should be corrected.</p>
Chris@4 1228 <p>Instead of asking to finish, the calling program may ask
Chris@4 1229 <code class="computeroutput">BZ2_bzCompress</code> to take all the
Chris@4 1230 remaining input, compress it and terminate the current
Chris@4 1231 (Burrows-Wheeler) compression block. This could be useful for
Chris@4 1232 error control purposes. The mechanism is analogous to that for
Chris@4 1233 finishing: call <code class="computeroutput">BZ2_bzCompress</code>
Chris@4 1234 with an action of <code class="computeroutput">BZ_FLUSH</code>,
Chris@4 1235 remove output data, and persist with the
Chris@4 1236 <code class="computeroutput">BZ_FLUSH</code> action until the value
Chris@4 1237 <code class="computeroutput">BZ_RUN</code> is returned. As with
Chris@4 1238 finishing, <code class="computeroutput">BZ2_bzCompress</code>
Chris@4 1239 detects any attempt to provide more input data once the flush has
Chris@4 1240 begun.</p>
Chris@4 1241 <p>Once the flush is complete, the stream returns to the
Chris@4 1242 normal RUNNING state.</p>
Chris@4 1243 <p>This all sounds pretty complex, but isn't really. Here's a
Chris@4 1244 table which shows which actions are allowable in each state, what
Chris@4 1245 action will be taken, what the next state is, and what the
Chris@4 1246 non-error return values are. Note that you can't explicitly ask
Chris@4 1247 what state the stream is in, but nor do you need to -- it can be
Chris@4 1248 inferred from the values returned by
Chris@4 1249 <code class="computeroutput">BZ2_bzCompress</code>.</p>
Chris@4 1250 <pre class="programlisting">IDLE/any
Chris@4 1251 Illegal. IDLE state only exists after BZ2_bzCompressEnd or
Chris@4 1252 before BZ2_bzCompressInit.
Chris@4 1253 Return value = BZ_SEQUENCE_ERROR
Chris@4 1254
Chris@4 1255 RUNNING/BZ_RUN
Chris@4 1256 Compress from next_in to next_out as much as possible.
Chris@4 1257 Next state = RUNNING
Chris@4 1258 Return value = BZ_RUN_OK
Chris@4 1259
Chris@4 1260 RUNNING/BZ_FLUSH
Chris@4 1261 Remember current value of next_in. Compress from next_in
Chris@4 1262 to next_out as much as possible, but do not accept any more input.
Chris@4 1263 Next state = FLUSHING
Chris@4 1264 Return value = BZ_FLUSH_OK
Chris@4 1265
Chris@4 1266 RUNNING/BZ_FINISH
Chris@4 1267 Remember current value of next_in. Compress from next_in
Chris@4 1268 to next_out as much as possible, but do not accept any more input.
Chris@4 1269 Next state = FINISHING
Chris@4 1270 Return value = BZ_FINISH_OK
Chris@4 1271
Chris@4 1272 FLUSHING/BZ_FLUSH
Chris@4 1273 Compress from next_in to next_out as much as possible,
Chris@4 1274 but do not accept any more input.
Chris@4 1275 If all the existing input has been used up and all compressed
Chris@4 1276 output has been removed
Chris@4 1277 Next state = RUNNING; Return value = BZ_RUN_OK
Chris@4 1278 else
Chris@4 1279 Next state = FLUSHING; Return value = BZ_FLUSH_OK
Chris@4 1280
Chris@4 1281 FLUSHING/other
Chris@4 1282 Illegal.
Chris@4 1283 Return value = BZ_SEQUENCE_ERROR
Chris@4 1284
Chris@4 1285 FINISHING/BZ_FINISH
Chris@4 1286 Compress from next_in to next_out as much as possible,
Chris@4 1287 but to not accept any more input.
Chris@4 1288 If all the existing input has been used up and all compressed
Chris@4 1289 output has been removed
Chris@4 1290 Next state = IDLE; Return value = BZ_STREAM_END
Chris@4 1291 else
Chris@4 1292 Next state = FINISHING; Return value = BZ_FINISH_OK
Chris@4 1293
Chris@4 1294 FINISHING/other
Chris@4 1295 Illegal.
Chris@4 1296 Return value = BZ_SEQUENCE_ERROR</pre>
Chris@4 1297 <p>That still looks complicated? Well, fair enough. The
Chris@4 1298 usual sequence of calls for compressing a load of data is:</p>
Chris@4 1299 <div class="orderedlist"><ol class="orderedlist" type="1">
Chris@4 1300 <li class="listitem"><p>Get started with
Chris@4 1301 <code class="computeroutput">BZ2_bzCompressInit</code>.</p></li>
Chris@4 1302 <li class="listitem"><p>Shovel data in and shlurp out its compressed form
Chris@4 1303 using zero or more calls of
Chris@4 1304 <code class="computeroutput">BZ2_bzCompress</code> with action =
Chris@4 1305 <code class="computeroutput">BZ_RUN</code>.</p></li>
Chris@4 1306 <li class="listitem"><p>Finish up. Repeatedly call
Chris@4 1307 <code class="computeroutput">BZ2_bzCompress</code> with action =
Chris@4 1308 <code class="computeroutput">BZ_FINISH</code>, copying out the
Chris@4 1309 compressed output, until
Chris@4 1310 <code class="computeroutput">BZ_STREAM_END</code> is
Chris@4 1311 returned.</p></li>
Chris@4 1312 <li class="listitem"><p>Close up and go home. Call
Chris@4 1313 <code class="computeroutput">BZ2_bzCompressEnd</code>.</p></li>
Chris@4 1314 </ol></div>
Chris@4 1315 <p>If the data you want to compress fits into your input
Chris@4 1316 buffer all at once, you can skip the calls of
Chris@4 1317 <code class="computeroutput">BZ2_bzCompress ( ..., BZ_RUN )</code>
Chris@4 1318 and just do the <code class="computeroutput">BZ2_bzCompress ( ..., BZ_FINISH
Chris@4 1319 )</code> calls.</p>
Chris@4 1320 <p>All required memory is allocated by
Chris@4 1321 <code class="computeroutput">BZ2_bzCompressInit</code>. The
Chris@4 1322 compression library can accept any data at all (obviously). So
Chris@4 1323 you shouldn't get any error return values from the
Chris@4 1324 <code class="computeroutput">BZ2_bzCompress</code> calls. If you
Chris@4 1325 do, they will be
Chris@4 1326 <code class="computeroutput">BZ_SEQUENCE_ERROR</code>, and indicate
Chris@4 1327 a bug in your programming.</p>
Chris@4 1328 <p>Trivial other possible return values:</p>
Chris@4 1329 <pre class="programlisting">BZ_PARAM_ERROR
Chris@4 1330 if strm is NULL, or strm-&gt;s is NULL</pre>
Chris@4 1331 </div>
Chris@4 1332 <div class="sect2" title="3.3.3. BZ2_bzCompressEnd">
Chris@4 1333 <div class="titlepage"><div><div><h3 class="title">
Chris@4 1334 <a name="bzCompress-end"></a>3.3.3. BZ2_bzCompressEnd</h3></div></div></div>
Chris@4 1335 <pre class="programlisting">int BZ2_bzCompressEnd ( bz_stream *strm );</pre>
Chris@4 1336 <p>Releases all memory associated with a compression
Chris@4 1337 stream.</p>
Chris@4 1338 <p>Possible return values:</p>
Chris@4 1339 <pre class="programlisting">BZ_PARAM_ERROR if strm is NULL or strm-&gt;s is NULL
Chris@4 1340 BZ_OK otherwise</pre>
Chris@4 1341 </div>
Chris@4 1342 <div class="sect2" title="3.3.4. BZ2_bzDecompressInit">
Chris@4 1343 <div class="titlepage"><div><div><h3 class="title">
Chris@4 1344 <a name="bzDecompress-init"></a>3.3.4. BZ2_bzDecompressInit</h3></div></div></div>
Chris@4 1345 <pre class="programlisting">int BZ2_bzDecompressInit ( bz_stream *strm, int verbosity, int small );</pre>
Chris@4 1346 <p>Prepares for decompression. As with
Chris@4 1347 <code class="computeroutput">BZ2_bzCompressInit</code>, a
Chris@4 1348 <code class="computeroutput">bz_stream</code> record should be
Chris@4 1349 allocated and initialised before the call. Fields
Chris@4 1350 <code class="computeroutput">bzalloc</code>,
Chris@4 1351 <code class="computeroutput">bzfree</code> and
Chris@4 1352 <code class="computeroutput">opaque</code> should be set if a custom
Chris@4 1353 memory allocator is required, or made
Chris@4 1354 <code class="computeroutput">NULL</code> for the normal
Chris@4 1355 <code class="computeroutput">malloc</code> /
Chris@4 1356 <code class="computeroutput">free</code> routines. Upon return, the
Chris@4 1357 internal state will have been initialised, and
Chris@4 1358 <code class="computeroutput">total_in</code> and
Chris@4 1359 <code class="computeroutput">total_out</code> will be zero.</p>
Chris@4 1360 <p>For the meaning of parameter
Chris@4 1361 <code class="computeroutput">verbosity</code>, see
Chris@4 1362 <code class="computeroutput">BZ2_bzCompressInit</code>.</p>
Chris@4 1363 <p>If <code class="computeroutput">small</code> is nonzero, the
Chris@4 1364 library will use an alternative decompression algorithm which
Chris@4 1365 uses less memory but at the cost of decompressing more slowly
Chris@4 1366 (roughly speaking, half the speed, but the maximum memory
Chris@4 1367 requirement drops to around 2300k). See <a class="xref" href="#using" title="2. How to use bzip2">How to use bzip2</a>
Chris@4 1368 for more information on memory management.</p>
Chris@4 1369 <p>Note that the amount of memory needed to decompress a
Chris@4 1370 stream cannot be determined until the stream's header has been
Chris@4 1371 read, so even if
Chris@4 1372 <code class="computeroutput">BZ2_bzDecompressInit</code> succeeds, a
Chris@4 1373 subsequent <code class="computeroutput">BZ2_bzDecompress</code>
Chris@4 1374 could fail with
Chris@4 1375 <code class="computeroutput">BZ_MEM_ERROR</code>.</p>
Chris@4 1376 <p>Possible return values:</p>
Chris@4 1377 <pre class="programlisting">BZ_CONFIG_ERROR
Chris@4 1378 if the library has been mis-compiled
Chris@4 1379 BZ_PARAM_ERROR
Chris@4 1380 if ( small != 0 &amp;&amp; small != 1 )
Chris@4 1381 or (verbosity &lt;; 0 || verbosity &gt; 4)
Chris@4 1382 BZ_MEM_ERROR
Chris@4 1383 if insufficient memory is available</pre>
Chris@4 1384 <p>Allowable next actions:</p>
Chris@4 1385 <pre class="programlisting">BZ2_bzDecompress
Chris@4 1386 if BZ_OK was returned
Chris@4 1387 no specific action required in case of error</pre>
Chris@4 1388 </div>
Chris@4 1389 <div class="sect2" title="3.3.5. BZ2_bzDecompress">
Chris@4 1390 <div class="titlepage"><div><div><h3 class="title">
Chris@4 1391 <a name="bzDecompress"></a>3.3.5. BZ2_bzDecompress</h3></div></div></div>
Chris@4 1392 <pre class="programlisting">int BZ2_bzDecompress ( bz_stream *strm );</pre>
Chris@4 1393 <p>Provides more input and/out output buffer space for the
Chris@4 1394 library. The caller maintains input and output buffers, and uses
Chris@4 1395 <code class="computeroutput">BZ2_bzDecompress</code> to transfer
Chris@4 1396 data between them.</p>
Chris@4 1397 <p>Before each call to
Chris@4 1398 <code class="computeroutput">BZ2_bzDecompress</code>,
Chris@4 1399 <code class="computeroutput">next_in</code> should point at the
Chris@4 1400 compressed data, and <code class="computeroutput">avail_in</code>
Chris@4 1401 should indicate how many bytes the library may read.
Chris@4 1402 <code class="computeroutput">BZ2_bzDecompress</code> updates
Chris@4 1403 <code class="computeroutput">next_in</code>,
Chris@4 1404 <code class="computeroutput">avail_in</code> and
Chris@4 1405 <code class="computeroutput">total_in</code> to reflect the number
Chris@4 1406 of bytes it has read.</p>
Chris@4 1407 <p>Similarly, <code class="computeroutput">next_out</code> should
Chris@4 1408 point to a buffer in which the uncompressed output is to be
Chris@4 1409 placed, with <code class="computeroutput">avail_out</code>
Chris@4 1410 indicating how much output space is available.
Chris@4 1411 <code class="computeroutput">BZ2_bzCompress</code> updates
Chris@4 1412 <code class="computeroutput">next_out</code>,
Chris@4 1413 <code class="computeroutput">avail_out</code> and
Chris@4 1414 <code class="computeroutput">total_out</code> to reflect the number
Chris@4 1415 of bytes output.</p>
Chris@4 1416 <p>You may provide and remove as little or as much data as you
Chris@4 1417 like on each call of
Chris@4 1418 <code class="computeroutput">BZ2_bzDecompress</code>. In the limit,
Chris@4 1419 it is acceptable to supply and remove data one byte at a time,
Chris@4 1420 although this would be terribly inefficient. You should always
Chris@4 1421 ensure that at least one byte of output space is available at
Chris@4 1422 each call.</p>
Chris@4 1423 <p>Use of <code class="computeroutput">BZ2_bzDecompress</code> is
Chris@4 1424 simpler than
Chris@4 1425 <code class="computeroutput">BZ2_bzCompress</code>.</p>
Chris@4 1426 <p>You should provide input and remove output as described
Chris@4 1427 above, and repeatedly call
Chris@4 1428 <code class="computeroutput">BZ2_bzDecompress</code> until
Chris@4 1429 <code class="computeroutput">BZ_STREAM_END</code> is returned.
Chris@4 1430 Appearance of <code class="computeroutput">BZ_STREAM_END</code>
Chris@4 1431 denotes that <code class="computeroutput">BZ2_bzDecompress</code>
Chris@4 1432 has detected the logical end of the compressed stream.
Chris@4 1433 <code class="computeroutput">BZ2_bzDecompress</code> will not
Chris@4 1434 produce <code class="computeroutput">BZ_STREAM_END</code> until all
Chris@4 1435 output data has been placed into the output buffer, so once
Chris@4 1436 <code class="computeroutput">BZ_STREAM_END</code> appears, you are
Chris@4 1437 guaranteed to have available all the decompressed output, and
Chris@4 1438 <code class="computeroutput">BZ2_bzDecompressEnd</code> can safely
Chris@4 1439 be called.</p>
Chris@4 1440 <p>If case of an error return value, you should call
Chris@4 1441 <code class="computeroutput">BZ2_bzDecompressEnd</code> to clean up
Chris@4 1442 and release memory.</p>
Chris@4 1443 <p>Possible return values:</p>
Chris@4 1444 <pre class="programlisting">BZ_PARAM_ERROR
Chris@4 1445 if strm is NULL or strm-&gt;s is NULL
Chris@4 1446 or strm-&gt;avail_out &lt; 1
Chris@4 1447 BZ_DATA_ERROR
Chris@4 1448 if a data integrity error is detected in the compressed stream
Chris@4 1449 BZ_DATA_ERROR_MAGIC
Chris@4 1450 if the compressed stream doesn't begin with the right magic bytes
Chris@4 1451 BZ_MEM_ERROR
Chris@4 1452 if there wasn't enough memory available
Chris@4 1453 BZ_STREAM_END
Chris@4 1454 if the logical end of the data stream was detected and all
Chris@4 1455 output in has been consumed, eg s--&gt;avail_out &gt; 0
Chris@4 1456 BZ_OK
Chris@4 1457 otherwise</pre>
Chris@4 1458 <p>Allowable next actions:</p>
Chris@4 1459 <pre class="programlisting">BZ2_bzDecompress
Chris@4 1460 if BZ_OK was returned
Chris@4 1461 BZ2_bzDecompressEnd
Chris@4 1462 otherwise</pre>
Chris@4 1463 </div>
Chris@4 1464 <div class="sect2" title="3.3.6. BZ2_bzDecompressEnd">
Chris@4 1465 <div class="titlepage"><div><div><h3 class="title">
Chris@4 1466 <a name="bzDecompress-end"></a>3.3.6. BZ2_bzDecompressEnd</h3></div></div></div>
Chris@4 1467 <pre class="programlisting">int BZ2_bzDecompressEnd ( bz_stream *strm );</pre>
Chris@4 1468 <p>Releases all memory associated with a decompression
Chris@4 1469 stream.</p>
Chris@4 1470 <p>Possible return values:</p>
Chris@4 1471 <pre class="programlisting">BZ_PARAM_ERROR
Chris@4 1472 if strm is NULL or strm-&gt;s is NULL
Chris@4 1473 BZ_OK
Chris@4 1474 otherwise</pre>
Chris@4 1475 <p>Allowable next actions:</p>
Chris@4 1476 <pre class="programlisting"> None.</pre>
Chris@4 1477 </div>
Chris@4 1478 </div>
Chris@4 1479 <div class="sect1" title="3.4. High-level interface">
Chris@4 1480 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
Chris@4 1481 <a name="hl-interface"></a>3.4. High-level interface</h2></div></div></div>
Chris@4 1482 <p>This interface provides functions for reading and writing
Chris@4 1483 <code class="computeroutput">bzip2</code> format files. First, some
Chris@4 1484 general points.</p>
Chris@4 1485 <div class="itemizedlist"><ul class="itemizedlist" type="bullet">
Chris@4 1486 <li class="listitem" style="list-style-type: disc"><p>All of the functions take an
Chris@4 1487 <code class="computeroutput">int*</code> first argument,
Chris@4 1488 <code class="computeroutput">bzerror</code>. After each call,
Chris@4 1489 <code class="computeroutput">bzerror</code> should be consulted
Chris@4 1490 first to determine the outcome of the call. If
Chris@4 1491 <code class="computeroutput">bzerror</code> is
Chris@4 1492 <code class="computeroutput">BZ_OK</code>, the call completed
Chris@4 1493 successfully, and only then should the return value of the
Chris@4 1494 function (if any) be consulted. If
Chris@4 1495 <code class="computeroutput">bzerror</code> is
Chris@4 1496 <code class="computeroutput">BZ_IO_ERROR</code>, there was an
Chris@4 1497 error reading/writing the underlying compressed file, and you
Chris@4 1498 should then consult <code class="computeroutput">errno</code> /
Chris@4 1499 <code class="computeroutput">perror</code> to determine the cause
Chris@4 1500 of the difficulty. <code class="computeroutput">bzerror</code>
Chris@4 1501 may also be set to various other values; precise details are
Chris@4 1502 given on a per-function basis below.</p></li>
Chris@4 1503 <li class="listitem" style="list-style-type: disc"><p>If <code class="computeroutput">bzerror</code> indicates
Chris@4 1504 an error (ie, anything except
Chris@4 1505 <code class="computeroutput">BZ_OK</code> and
Chris@4 1506 <code class="computeroutput">BZ_STREAM_END</code>), you should
Chris@4 1507 immediately call
Chris@4 1508 <code class="computeroutput">BZ2_bzReadClose</code> (or
Chris@4 1509 <code class="computeroutput">BZ2_bzWriteClose</code>, depending on
Chris@4 1510 whether you are attempting to read or to write) to free up all
Chris@4 1511 resources associated with the stream. Once an error has been
Chris@4 1512 indicated, behaviour of all calls except
Chris@4 1513 <code class="computeroutput">BZ2_bzReadClose</code>
Chris@4 1514 (<code class="computeroutput">BZ2_bzWriteClose</code>) is
Chris@4 1515 undefined. The implication is that (1)
Chris@4 1516 <code class="computeroutput">bzerror</code> should be checked
Chris@4 1517 after each call, and (2) if
Chris@4 1518 <code class="computeroutput">bzerror</code> indicates an error,
Chris@4 1519 <code class="computeroutput">BZ2_bzReadClose</code>
Chris@4 1520 (<code class="computeroutput">BZ2_bzWriteClose</code>) should then
Chris@4 1521 be called to clean up.</p></li>
Chris@4 1522 <li class="listitem" style="list-style-type: disc"><p>The <code class="computeroutput">FILE*</code> arguments
Chris@4 1523 passed to <code class="computeroutput">BZ2_bzReadOpen</code> /
Chris@4 1524 <code class="computeroutput">BZ2_bzWriteOpen</code> should be set
Chris@4 1525 to binary mode. Most Unix systems will do this by default, but
Chris@4 1526 other platforms, including Windows and Mac, will not. If you
Chris@4 1527 omit this, you may encounter problems when moving code to new
Chris@4 1528 platforms.</p></li>
Chris@4 1529 <li class="listitem" style="list-style-type: disc"><p>Memory allocation requests are handled by
Chris@4 1530 <code class="computeroutput">malloc</code> /
Chris@4 1531 <code class="computeroutput">free</code>. At present there is no
Chris@4 1532 facility for user-defined memory allocators in the file I/O
Chris@4 1533 functions (could easily be added, though).</p></li>
Chris@4 1534 </ul></div>
Chris@4 1535 <div class="sect2" title="3.4.1. BZ2_bzReadOpen">
Chris@4 1536 <div class="titlepage"><div><div><h3 class="title">
Chris@4 1537 <a name="bzreadopen"></a>3.4.1. BZ2_bzReadOpen</h3></div></div></div>
Chris@4 1538 <pre class="programlisting">typedef void BZFILE;
Chris@4 1539
Chris@4 1540 BZFILE *BZ2_bzReadOpen( int *bzerror, FILE *f,
Chris@4 1541 int verbosity, int small,
Chris@4 1542 void *unused, int nUnused );</pre>
Chris@4 1543 <p>Prepare to read compressed data from file handle
Chris@4 1544 <code class="computeroutput">f</code>.
Chris@4 1545 <code class="computeroutput">f</code> should refer to a file which
Chris@4 1546 has been opened for reading, and for which the error indicator
Chris@4 1547 (<code class="computeroutput">ferror(f)</code>)is not set. If
Chris@4 1548 <code class="computeroutput">small</code> is 1, the library will try
Chris@4 1549 to decompress using less memory, at the expense of speed.</p>
Chris@4 1550 <p>For reasons explained below,
Chris@4 1551 <code class="computeroutput">BZ2_bzRead</code> will decompress the
Chris@4 1552 <code class="computeroutput">nUnused</code> bytes starting at
Chris@4 1553 <code class="computeroutput">unused</code>, before starting to read
Chris@4 1554 from the file <code class="computeroutput">f</code>. At most
Chris@4 1555 <code class="computeroutput">BZ_MAX_UNUSED</code> bytes may be
Chris@4 1556 supplied like this. If this facility is not required, you should
Chris@4 1557 pass <code class="computeroutput">NULL</code> and
Chris@4 1558 <code class="computeroutput">0</code> for
Chris@4 1559 <code class="computeroutput">unused</code> and
Chris@4 1560 n<code class="computeroutput">Unused</code> respectively.</p>
Chris@4 1561 <p>For the meaning of parameters
Chris@4 1562 <code class="computeroutput">small</code> and
Chris@4 1563 <code class="computeroutput">verbosity</code>, see
Chris@4 1564 <code class="computeroutput">BZ2_bzDecompressInit</code>.</p>
Chris@4 1565 <p>The amount of memory needed to decompress a file cannot be
Chris@4 1566 determined until the file's header has been read. So it is
Chris@4 1567 possible that <code class="computeroutput">BZ2_bzReadOpen</code>
Chris@4 1568 returns <code class="computeroutput">BZ_OK</code> but a subsequent
Chris@4 1569 call of <code class="computeroutput">BZ2_bzRead</code> will return
Chris@4 1570 <code class="computeroutput">BZ_MEM_ERROR</code>.</p>
Chris@4 1571 <p>Possible assignments to
Chris@4 1572 <code class="computeroutput">bzerror</code>:</p>
Chris@4 1573 <pre class="programlisting">BZ_CONFIG_ERROR
Chris@4 1574 if the library has been mis-compiled
Chris@4 1575 BZ_PARAM_ERROR
Chris@4 1576 if f is NULL
Chris@4 1577 or small is neither 0 nor 1
Chris@4 1578 or ( unused == NULL &amp;&amp; nUnused != 0 )
Chris@4 1579 or ( unused != NULL &amp;&amp; !(0 &lt;= nUnused &lt;= BZ_MAX_UNUSED) )
Chris@4 1580 BZ_IO_ERROR
Chris@4 1581 if ferror(f) is nonzero
Chris@4 1582 BZ_MEM_ERROR
Chris@4 1583 if insufficient memory is available
Chris@4 1584 BZ_OK
Chris@4 1585 otherwise.</pre>
Chris@4 1586 <p>Possible return values:</p>
Chris@4 1587 <pre class="programlisting">Pointer to an abstract BZFILE
Chris@4 1588 if bzerror is BZ_OK
Chris@4 1589 NULL
Chris@4 1590 otherwise</pre>
Chris@4 1591 <p>Allowable next actions:</p>
Chris@4 1592 <pre class="programlisting">BZ2_bzRead
Chris@4 1593 if bzerror is BZ_OK
Chris@4 1594 BZ2_bzClose
Chris@4 1595 otherwise</pre>
Chris@4 1596 </div>
Chris@4 1597 <div class="sect2" title="3.4.2. BZ2_bzRead">
Chris@4 1598 <div class="titlepage"><div><div><h3 class="title">
Chris@4 1599 <a name="bzread"></a>3.4.2. BZ2_bzRead</h3></div></div></div>
Chris@4 1600 <pre class="programlisting">int BZ2_bzRead ( int *bzerror, BZFILE *b, void *buf, int len );</pre>
Chris@4 1601 <p>Reads up to <code class="computeroutput">len</code>
Chris@4 1602 (uncompressed) bytes from the compressed file
Chris@4 1603 <code class="computeroutput">b</code> into the buffer
Chris@4 1604 <code class="computeroutput">buf</code>. If the read was
Chris@4 1605 successful, <code class="computeroutput">bzerror</code> is set to
Chris@4 1606 <code class="computeroutput">BZ_OK</code> and the number of bytes
Chris@4 1607 read is returned. If the logical end-of-stream was detected,
Chris@4 1608 <code class="computeroutput">bzerror</code> will be set to
Chris@4 1609 <code class="computeroutput">BZ_STREAM_END</code>, and the number of
Chris@4 1610 bytes read is returned. All other
Chris@4 1611 <code class="computeroutput">bzerror</code> values denote an
Chris@4 1612 error.</p>
Chris@4 1613 <p><code class="computeroutput">BZ2_bzRead</code> will supply
Chris@4 1614 <code class="computeroutput">len</code> bytes, unless the logical
Chris@4 1615 stream end is detected or an error occurs. Because of this, it
Chris@4 1616 is possible to detect the stream end by observing when the number
Chris@4 1617 of bytes returned is less than the number requested.
Chris@4 1618 Nevertheless, this is regarded as inadvisable; you should instead
Chris@4 1619 check <code class="computeroutput">bzerror</code> after every call
Chris@4 1620 and watch out for
Chris@4 1621 <code class="computeroutput">BZ_STREAM_END</code>.</p>
Chris@4 1622 <p>Internally, <code class="computeroutput">BZ2_bzRead</code>
Chris@4 1623 copies data from the compressed file in chunks of size
Chris@4 1624 <code class="computeroutput">BZ_MAX_UNUSED</code> bytes before
Chris@4 1625 decompressing it. If the file contains more bytes than strictly
Chris@4 1626 needed to reach the logical end-of-stream,
Chris@4 1627 <code class="computeroutput">BZ2_bzRead</code> will almost certainly
Chris@4 1628 read some of the trailing data before signalling
Chris@4 1629 <code class="computeroutput">BZ_SEQUENCE_END</code>. To collect the
Chris@4 1630 read but unused data once
Chris@4 1631 <code class="computeroutput">BZ_SEQUENCE_END</code> has appeared,
Chris@4 1632 call <code class="computeroutput">BZ2_bzReadGetUnused</code>
Chris@4 1633 immediately before
Chris@4 1634 <code class="computeroutput">BZ2_bzReadClose</code>.</p>
Chris@4 1635 <p>Possible assignments to
Chris@4 1636 <code class="computeroutput">bzerror</code>:</p>
Chris@4 1637 <pre class="programlisting">BZ_PARAM_ERROR
Chris@4 1638 if b is NULL or buf is NULL or len &lt; 0
Chris@4 1639 BZ_SEQUENCE_ERROR
Chris@4 1640 if b was opened with BZ2_bzWriteOpen
Chris@4 1641 BZ_IO_ERROR
Chris@4 1642 if there is an error reading from the compressed file
Chris@4 1643 BZ_UNEXPECTED_EOF
Chris@4 1644 if the compressed file ended before
Chris@4 1645 the logical end-of-stream was detected
Chris@4 1646 BZ_DATA_ERROR
Chris@4 1647 if a data integrity error was detected in the compressed stream
Chris@4 1648 BZ_DATA_ERROR_MAGIC
Chris@4 1649 if the stream does not begin with the requisite header bytes
Chris@4 1650 (ie, is not a bzip2 data file). This is really
Chris@4 1651 a special case of BZ_DATA_ERROR.
Chris@4 1652 BZ_MEM_ERROR
Chris@4 1653 if insufficient memory was available
Chris@4 1654 BZ_STREAM_END
Chris@4 1655 if the logical end of stream was detected.
Chris@4 1656 BZ_OK
Chris@4 1657 otherwise.</pre>
Chris@4 1658 <p>Possible return values:</p>
Chris@4 1659 <pre class="programlisting">number of bytes read
Chris@4 1660 if bzerror is BZ_OK or BZ_STREAM_END
Chris@4 1661 undefined
Chris@4 1662 otherwise</pre>
Chris@4 1663 <p>Allowable next actions:</p>
Chris@4 1664 <pre class="programlisting">collect data from buf, then BZ2_bzRead or BZ2_bzReadClose
Chris@4 1665 if bzerror is BZ_OK
Chris@4 1666 collect data from buf, then BZ2_bzReadClose or BZ2_bzReadGetUnused
Chris@4 1667 if bzerror is BZ_SEQUENCE_END
Chris@4 1668 BZ2_bzReadClose
Chris@4 1669 otherwise</pre>
Chris@4 1670 </div>
Chris@4 1671 <div class="sect2" title="3.4.3. BZ2_bzReadGetUnused">
Chris@4 1672 <div class="titlepage"><div><div><h3 class="title">
Chris@4 1673 <a name="bzreadgetunused"></a>3.4.3. BZ2_bzReadGetUnused</h3></div></div></div>
Chris@4 1674 <pre class="programlisting">void BZ2_bzReadGetUnused( int* bzerror, BZFILE *b,
Chris@4 1675 void** unused, int* nUnused );</pre>
Chris@4 1676 <p>Returns data which was read from the compressed file but
Chris@4 1677 was not needed to get to the logical end-of-stream.
Chris@4 1678 <code class="computeroutput">*unused</code> is set to the address of
Chris@4 1679 the data, and <code class="computeroutput">*nUnused</code> to the
Chris@4 1680 number of bytes. <code class="computeroutput">*nUnused</code> will
Chris@4 1681 be set to a value between <code class="computeroutput">0</code> and
Chris@4 1682 <code class="computeroutput">BZ_MAX_UNUSED</code> inclusive.</p>
Chris@4 1683 <p>This function may only be called once
Chris@4 1684 <code class="computeroutput">BZ2_bzRead</code> has signalled
Chris@4 1685 <code class="computeroutput">BZ_STREAM_END</code> but before
Chris@4 1686 <code class="computeroutput">BZ2_bzReadClose</code>.</p>
Chris@4 1687 <p>Possible assignments to
Chris@4 1688 <code class="computeroutput">bzerror</code>:</p>
Chris@4 1689 <pre class="programlisting">BZ_PARAM_ERROR
Chris@4 1690 if b is NULL
Chris@4 1691 or unused is NULL or nUnused is NULL
Chris@4 1692 BZ_SEQUENCE_ERROR
Chris@4 1693 if BZ_STREAM_END has not been signalled
Chris@4 1694 or if b was opened with BZ2_bzWriteOpen
Chris@4 1695 BZ_OK
Chris@4 1696 otherwise</pre>
Chris@4 1697 <p>Allowable next actions:</p>
Chris@4 1698 <pre class="programlisting">BZ2_bzReadClose</pre>
Chris@4 1699 </div>
Chris@4 1700 <div class="sect2" title="3.4.4. BZ2_bzReadClose">
Chris@4 1701 <div class="titlepage"><div><div><h3 class="title">
Chris@4 1702 <a name="bzreadclose"></a>3.4.4. BZ2_bzReadClose</h3></div></div></div>
Chris@4 1703 <pre class="programlisting">void BZ2_bzReadClose ( int *bzerror, BZFILE *b );</pre>
Chris@4 1704 <p>Releases all memory pertaining to the compressed file
Chris@4 1705 <code class="computeroutput">b</code>.
Chris@4 1706 <code class="computeroutput">BZ2_bzReadClose</code> does not call
Chris@4 1707 <code class="computeroutput">fclose</code> on the underlying file
Chris@4 1708 handle, so you should do that yourself if appropriate.
Chris@4 1709 <code class="computeroutput">BZ2_bzReadClose</code> should be called
Chris@4 1710 to clean up after all error situations.</p>
Chris@4 1711 <p>Possible assignments to
Chris@4 1712 <code class="computeroutput">bzerror</code>:</p>
Chris@4 1713 <pre class="programlisting">BZ_SEQUENCE_ERROR
Chris@4 1714 if b was opened with BZ2_bzOpenWrite
Chris@4 1715 BZ_OK
Chris@4 1716 otherwise</pre>
Chris@4 1717 <p>Allowable next actions:</p>
Chris@4 1718 <pre class="programlisting">none</pre>
Chris@4 1719 </div>
Chris@4 1720 <div class="sect2" title="3.4.5. BZ2_bzWriteOpen">
Chris@4 1721 <div class="titlepage"><div><div><h3 class="title">
Chris@4 1722 <a name="bzwriteopen"></a>3.4.5. BZ2_bzWriteOpen</h3></div></div></div>
Chris@4 1723 <pre class="programlisting">BZFILE *BZ2_bzWriteOpen( int *bzerror, FILE *f,
Chris@4 1724 int blockSize100k, int verbosity,
Chris@4 1725 int workFactor );</pre>
Chris@4 1726 <p>Prepare to write compressed data to file handle
Chris@4 1727 <code class="computeroutput">f</code>.
Chris@4 1728 <code class="computeroutput">f</code> should refer to a file which
Chris@4 1729 has been opened for writing, and for which the error indicator
Chris@4 1730 (<code class="computeroutput">ferror(f)</code>)is not set.</p>
Chris@4 1731 <p>For the meaning of parameters
Chris@4 1732 <code class="computeroutput">blockSize100k</code>,
Chris@4 1733 <code class="computeroutput">verbosity</code> and
Chris@4 1734 <code class="computeroutput">workFactor</code>, see
Chris@4 1735 <code class="computeroutput">BZ2_bzCompressInit</code>.</p>
Chris@4 1736 <p>All required memory is allocated at this stage, so if the
Chris@4 1737 call completes successfully,
Chris@4 1738 <code class="computeroutput">BZ_MEM_ERROR</code> cannot be signalled
Chris@4 1739 by a subsequent call to
Chris@4 1740 <code class="computeroutput">BZ2_bzWrite</code>.</p>
Chris@4 1741 <p>Possible assignments to
Chris@4 1742 <code class="computeroutput">bzerror</code>:</p>
Chris@4 1743 <pre class="programlisting">BZ_CONFIG_ERROR
Chris@4 1744 if the library has been mis-compiled
Chris@4 1745 BZ_PARAM_ERROR
Chris@4 1746 if f is NULL
Chris@4 1747 or blockSize100k &lt; 1 or blockSize100k &gt; 9
Chris@4 1748 BZ_IO_ERROR
Chris@4 1749 if ferror(f) is nonzero
Chris@4 1750 BZ_MEM_ERROR
Chris@4 1751 if insufficient memory is available
Chris@4 1752 BZ_OK
Chris@4 1753 otherwise</pre>
Chris@4 1754 <p>Possible return values:</p>
Chris@4 1755 <pre class="programlisting">Pointer to an abstract BZFILE
Chris@4 1756 if bzerror is BZ_OK
Chris@4 1757 NULL
Chris@4 1758 otherwise</pre>
Chris@4 1759 <p>Allowable next actions:</p>
Chris@4 1760 <pre class="programlisting">BZ2_bzWrite
Chris@4 1761 if bzerror is BZ_OK
Chris@4 1762 (you could go directly to BZ2_bzWriteClose, but this would be pretty pointless)
Chris@4 1763 BZ2_bzWriteClose
Chris@4 1764 otherwise</pre>
Chris@4 1765 </div>
Chris@4 1766 <div class="sect2" title="3.4.6. BZ2_bzWrite">
Chris@4 1767 <div class="titlepage"><div><div><h3 class="title">
Chris@4 1768 <a name="bzwrite"></a>3.4.6. BZ2_bzWrite</h3></div></div></div>
Chris@4 1769 <pre class="programlisting">void BZ2_bzWrite ( int *bzerror, BZFILE *b, void *buf, int len );</pre>
Chris@4 1770 <p>Absorbs <code class="computeroutput">len</code> bytes from the
Chris@4 1771 buffer <code class="computeroutput">buf</code>, eventually to be
Chris@4 1772 compressed and written to the file.</p>
Chris@4 1773 <p>Possible assignments to
Chris@4 1774 <code class="computeroutput">bzerror</code>:</p>
Chris@4 1775 <pre class="programlisting">BZ_PARAM_ERROR
Chris@4 1776 if b is NULL or buf is NULL or len &lt; 0
Chris@4 1777 BZ_SEQUENCE_ERROR
Chris@4 1778 if b was opened with BZ2_bzReadOpen
Chris@4 1779 BZ_IO_ERROR
Chris@4 1780 if there is an error writing the compressed file.
Chris@4 1781 BZ_OK
Chris@4 1782 otherwise</pre>
Chris@4 1783 </div>
Chris@4 1784 <div class="sect2" title="3.4.7. BZ2_bzWriteClose">
Chris@4 1785 <div class="titlepage"><div><div><h3 class="title">
Chris@4 1786 <a name="bzwriteclose"></a>3.4.7. BZ2_bzWriteClose</h3></div></div></div>
Chris@4 1787 <pre class="programlisting">void BZ2_bzWriteClose( int *bzerror, BZFILE* f,
Chris@4 1788 int abandon,
Chris@4 1789 unsigned int* nbytes_in,
Chris@4 1790 unsigned int* nbytes_out );
Chris@4 1791
Chris@4 1792 void BZ2_bzWriteClose64( int *bzerror, BZFILE* f,
Chris@4 1793 int abandon,
Chris@4 1794 unsigned int* nbytes_in_lo32,
Chris@4 1795 unsigned int* nbytes_in_hi32,
Chris@4 1796 unsigned int* nbytes_out_lo32,
Chris@4 1797 unsigned int* nbytes_out_hi32 );</pre>
Chris@4 1798 <p>Compresses and flushes to the compressed file all data so
Chris@4 1799 far supplied by <code class="computeroutput">BZ2_bzWrite</code>.
Chris@4 1800 The logical end-of-stream markers are also written, so subsequent
Chris@4 1801 calls to <code class="computeroutput">BZ2_bzWrite</code> are
Chris@4 1802 illegal. All memory associated with the compressed file
Chris@4 1803 <code class="computeroutput">b</code> is released.
Chris@4 1804 <code class="computeroutput">fflush</code> is called on the
Chris@4 1805 compressed file, but it is not
Chris@4 1806 <code class="computeroutput">fclose</code>'d.</p>
Chris@4 1807 <p>If <code class="computeroutput">BZ2_bzWriteClose</code> is
Chris@4 1808 called to clean up after an error, the only action is to release
Chris@4 1809 the memory. The library records the error codes issued by
Chris@4 1810 previous calls, so this situation will be detected automatically.
Chris@4 1811 There is no attempt to complete the compression operation, nor to
Chris@4 1812 <code class="computeroutput">fflush</code> the compressed file. You
Chris@4 1813 can force this behaviour to happen even in the case of no error,
Chris@4 1814 by passing a nonzero value to
Chris@4 1815 <code class="computeroutput">abandon</code>.</p>
Chris@4 1816 <p>If <code class="computeroutput">nbytes_in</code> is non-null,
Chris@4 1817 <code class="computeroutput">*nbytes_in</code> will be set to be the
Chris@4 1818 total volume of uncompressed data handled. Similarly,
Chris@4 1819 <code class="computeroutput">nbytes_out</code> will be set to the
Chris@4 1820 total volume of compressed data written. For compatibility with
Chris@4 1821 older versions of the library,
Chris@4 1822 <code class="computeroutput">BZ2_bzWriteClose</code> only yields the
Chris@4 1823 lower 32 bits of these counts. Use
Chris@4 1824 <code class="computeroutput">BZ2_bzWriteClose64</code> if you want
Chris@4 1825 the full 64 bit counts. These two functions are otherwise
Chris@4 1826 absolutely identical.</p>
Chris@4 1827 <p>Possible assignments to
Chris@4 1828 <code class="computeroutput">bzerror</code>:</p>
Chris@4 1829 <pre class="programlisting">BZ_SEQUENCE_ERROR
Chris@4 1830 if b was opened with BZ2_bzReadOpen
Chris@4 1831 BZ_IO_ERROR
Chris@4 1832 if there is an error writing the compressed file
Chris@4 1833 BZ_OK
Chris@4 1834 otherwise</pre>
Chris@4 1835 </div>
Chris@4 1836 <div class="sect2" title="3.4.8. Handling embedded compressed data streams">
Chris@4 1837 <div class="titlepage"><div><div><h3 class="title">
Chris@4 1838 <a name="embed"></a>3.4.8. Handling embedded compressed data streams</h3></div></div></div>
Chris@4 1839 <p>The high-level library facilitates use of
Chris@4 1840 <code class="computeroutput">bzip2</code> data streams which form
Chris@4 1841 some part of a surrounding, larger data stream.</p>
Chris@4 1842 <div class="itemizedlist"><ul class="itemizedlist" type="bullet">
Chris@4 1843 <li class="listitem" style="list-style-type: disc"><p>For writing, the library takes an open file handle,
Chris@4 1844 writes compressed data to it,
Chris@4 1845 <code class="computeroutput">fflush</code>es it but does not
Chris@4 1846 <code class="computeroutput">fclose</code> it. The calling
Chris@4 1847 application can write its own data before and after the
Chris@4 1848 compressed data stream, using that same file handle.</p></li>
Chris@4 1849 <li class="listitem" style="list-style-type: disc"><p>Reading is more complex, and the facilities are not as
Chris@4 1850 general as they could be since generality is hard to reconcile
Chris@4 1851 with efficiency. <code class="computeroutput">BZ2_bzRead</code>
Chris@4 1852 reads from the compressed file in blocks of size
Chris@4 1853 <code class="computeroutput">BZ_MAX_UNUSED</code> bytes, and in
Chris@4 1854 doing so probably will overshoot the logical end of compressed
Chris@4 1855 stream. To recover this data once decompression has ended,
Chris@4 1856 call <code class="computeroutput">BZ2_bzReadGetUnused</code> after
Chris@4 1857 the last call of <code class="computeroutput">BZ2_bzRead</code>
Chris@4 1858 (the one returning
Chris@4 1859 <code class="computeroutput">BZ_STREAM_END</code>) but before
Chris@4 1860 calling
Chris@4 1861 <code class="computeroutput">BZ2_bzReadClose</code>.</p></li>
Chris@4 1862 </ul></div>
Chris@4 1863 <p>This mechanism makes it easy to decompress multiple
Chris@4 1864 <code class="computeroutput">bzip2</code> streams placed end-to-end.
Chris@4 1865 As the end of one stream, when
Chris@4 1866 <code class="computeroutput">BZ2_bzRead</code> returns
Chris@4 1867 <code class="computeroutput">BZ_STREAM_END</code>, call
Chris@4 1868 <code class="computeroutput">BZ2_bzReadGetUnused</code> to collect
Chris@4 1869 the unused data (copy it into your own buffer somewhere). That
Chris@4 1870 data forms the start of the next compressed stream. To start
Chris@4 1871 uncompressing that next stream, call
Chris@4 1872 <code class="computeroutput">BZ2_bzReadOpen</code> again, feeding in
Chris@4 1873 the unused data via the <code class="computeroutput">unused</code> /
Chris@4 1874 <code class="computeroutput">nUnused</code> parameters. Keep doing
Chris@4 1875 this until <code class="computeroutput">BZ_STREAM_END</code> return
Chris@4 1876 coincides with the physical end of file
Chris@4 1877 (<code class="computeroutput">feof(f)</code>). In this situation
Chris@4 1878 <code class="computeroutput">BZ2_bzReadGetUnused</code> will of
Chris@4 1879 course return no data.</p>
Chris@4 1880 <p>This should give some feel for how the high-level interface
Chris@4 1881 can be used. If you require extra flexibility, you'll have to
Chris@4 1882 bite the bullet and get to grips with the low-level
Chris@4 1883 interface.</p>
Chris@4 1884 </div>
Chris@4 1885 <div class="sect2" title="3.4.9. Standard file-reading/writing code">
Chris@4 1886 <div class="titlepage"><div><div><h3 class="title">
Chris@4 1887 <a name="std-rdwr"></a>3.4.9. Standard file-reading/writing code</h3></div></div></div>
Chris@4 1888 <p>Here's how you'd write data to a compressed file:</p>
Chris@4 1889 <pre class="programlisting">FILE* f;
Chris@4 1890 BZFILE* b;
Chris@4 1891 int nBuf;
Chris@4 1892 char buf[ /* whatever size you like */ ];
Chris@4 1893 int bzerror;
Chris@4 1894 int nWritten;
Chris@4 1895
Chris@4 1896 f = fopen ( "myfile.bz2", "w" );
Chris@4 1897 if ( !f ) {
Chris@4 1898 /* handle error */
Chris@4 1899 }
Chris@4 1900 b = BZ2_bzWriteOpen( &amp;bzerror, f, 9 );
Chris@4 1901 if (bzerror != BZ_OK) {
Chris@4 1902 BZ2_bzWriteClose ( b );
Chris@4 1903 /* handle error */
Chris@4 1904 }
Chris@4 1905
Chris@4 1906 while ( /* condition */ ) {
Chris@4 1907 /* get data to write into buf, and set nBuf appropriately */
Chris@4 1908 nWritten = BZ2_bzWrite ( &amp;bzerror, b, buf, nBuf );
Chris@4 1909 if (bzerror == BZ_IO_ERROR) {
Chris@4 1910 BZ2_bzWriteClose ( &amp;bzerror, b );
Chris@4 1911 /* handle error */
Chris@4 1912 }
Chris@4 1913 }
Chris@4 1914
Chris@4 1915 BZ2_bzWriteClose( &amp;bzerror, b );
Chris@4 1916 if (bzerror == BZ_IO_ERROR) {
Chris@4 1917 /* handle error */
Chris@4 1918 }</pre>
Chris@4 1919 <p>And to read from a compressed file:</p>
Chris@4 1920 <pre class="programlisting">FILE* f;
Chris@4 1921 BZFILE* b;
Chris@4 1922 int nBuf;
Chris@4 1923 char buf[ /* whatever size you like */ ];
Chris@4 1924 int bzerror;
Chris@4 1925 int nWritten;
Chris@4 1926
Chris@4 1927 f = fopen ( "myfile.bz2", "r" );
Chris@4 1928 if ( !f ) {
Chris@4 1929 /* handle error */
Chris@4 1930 }
Chris@4 1931 b = BZ2_bzReadOpen ( &amp;bzerror, f, 0, NULL, 0 );
Chris@4 1932 if ( bzerror != BZ_OK ) {
Chris@4 1933 BZ2_bzReadClose ( &amp;bzerror, b );
Chris@4 1934 /* handle error */
Chris@4 1935 }
Chris@4 1936
Chris@4 1937 bzerror = BZ_OK;
Chris@4 1938 while ( bzerror == BZ_OK &amp;&amp; /* arbitrary other conditions */) {
Chris@4 1939 nBuf = BZ2_bzRead ( &amp;bzerror, b, buf, /* size of buf */ );
Chris@4 1940 if ( bzerror == BZ_OK ) {
Chris@4 1941 /* do something with buf[0 .. nBuf-1] */
Chris@4 1942 }
Chris@4 1943 }
Chris@4 1944 if ( bzerror != BZ_STREAM_END ) {
Chris@4 1945 BZ2_bzReadClose ( &amp;bzerror, b );
Chris@4 1946 /* handle error */
Chris@4 1947 } else {
Chris@4 1948 BZ2_bzReadClose ( &amp;bzerror, b );
Chris@4 1949 }</pre>
Chris@4 1950 </div>
Chris@4 1951 </div>
Chris@4 1952 <div class="sect1" title="3.5. Utility functions">
Chris@4 1953 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
Chris@4 1954 <a name="util-fns"></a>3.5. Utility functions</h2></div></div></div>
Chris@4 1955 <div class="sect2" title="3.5.1. BZ2_bzBuffToBuffCompress">
Chris@4 1956 <div class="titlepage"><div><div><h3 class="title">
Chris@4 1957 <a name="bzbufftobuffcompress"></a>3.5.1. BZ2_bzBuffToBuffCompress</h3></div></div></div>
Chris@4 1958 <pre class="programlisting">int BZ2_bzBuffToBuffCompress( char* dest,
Chris@4 1959 unsigned int* destLen,
Chris@4 1960 char* source,
Chris@4 1961 unsigned int sourceLen,
Chris@4 1962 int blockSize100k,
Chris@4 1963 int verbosity,
Chris@4 1964 int workFactor );</pre>
Chris@4 1965 <p>Attempts to compress the data in <code class="computeroutput">source[0
Chris@4 1966 .. sourceLen-1]</code> into the destination buffer,
Chris@4 1967 <code class="computeroutput">dest[0 .. *destLen-1]</code>. If the
Chris@4 1968 destination buffer is big enough,
Chris@4 1969 <code class="computeroutput">*destLen</code> is set to the size of
Chris@4 1970 the compressed data, and <code class="computeroutput">BZ_OK</code>
Chris@4 1971 is returned. If the compressed data won't fit,
Chris@4 1972 <code class="computeroutput">*destLen</code> is unchanged, and
Chris@4 1973 <code class="computeroutput">BZ_OUTBUFF_FULL</code> is
Chris@4 1974 returned.</p>
Chris@4 1975 <p>Compression in this manner is a one-shot event, done with a
Chris@4 1976 single call to this function. The resulting compressed data is a
Chris@4 1977 complete <code class="computeroutput">bzip2</code> format data
Chris@4 1978 stream. There is no mechanism for making additional calls to
Chris@4 1979 provide extra input data. If you want that kind of mechanism,
Chris@4 1980 use the low-level interface.</p>
Chris@4 1981 <p>For the meaning of parameters
Chris@4 1982 <code class="computeroutput">blockSize100k</code>,
Chris@4 1983 <code class="computeroutput">verbosity</code> and
Chris@4 1984 <code class="computeroutput">workFactor</code>, see
Chris@4 1985 <code class="computeroutput">BZ2_bzCompressInit</code>.</p>
Chris@4 1986 <p>To guarantee that the compressed data will fit in its
Chris@4 1987 buffer, allocate an output buffer of size 1% larger than the
Chris@4 1988 uncompressed data, plus six hundred extra bytes.</p>
Chris@4 1989 <p><code class="computeroutput">BZ2_bzBuffToBuffDecompress</code>
Chris@4 1990 will not write data at or beyond
Chris@4 1991 <code class="computeroutput">dest[*destLen]</code>, even in case of
Chris@4 1992 buffer overflow.</p>
Chris@4 1993 <p>Possible return values:</p>
Chris@4 1994 <pre class="programlisting">BZ_CONFIG_ERROR
Chris@4 1995 if the library has been mis-compiled
Chris@4 1996 BZ_PARAM_ERROR
Chris@4 1997 if dest is NULL or destLen is NULL
Chris@4 1998 or blockSize100k &lt; 1 or blockSize100k &gt; 9
Chris@4 1999 or verbosity &lt; 0 or verbosity &gt; 4
Chris@4 2000 or workFactor &lt; 0 or workFactor &gt; 250
Chris@4 2001 BZ_MEM_ERROR
Chris@4 2002 if insufficient memory is available
Chris@4 2003 BZ_OUTBUFF_FULL
Chris@4 2004 if the size of the compressed data exceeds *destLen
Chris@4 2005 BZ_OK
Chris@4 2006 otherwise</pre>
Chris@4 2007 </div>
Chris@4 2008 <div class="sect2" title="3.5.2. BZ2_bzBuffToBuffDecompress">
Chris@4 2009 <div class="titlepage"><div><div><h3 class="title">
Chris@4 2010 <a name="bzbufftobuffdecompress"></a>3.5.2. BZ2_bzBuffToBuffDecompress</h3></div></div></div>
Chris@4 2011 <pre class="programlisting">int BZ2_bzBuffToBuffDecompress( char* dest,
Chris@4 2012 unsigned int* destLen,
Chris@4 2013 char* source,
Chris@4 2014 unsigned int sourceLen,
Chris@4 2015 int small,
Chris@4 2016 int verbosity );</pre>
Chris@4 2017 <p>Attempts to decompress the data in <code class="computeroutput">source[0
Chris@4 2018 .. sourceLen-1]</code> into the destination buffer,
Chris@4 2019 <code class="computeroutput">dest[0 .. *destLen-1]</code>. If the
Chris@4 2020 destination buffer is big enough,
Chris@4 2021 <code class="computeroutput">*destLen</code> is set to the size of
Chris@4 2022 the uncompressed data, and <code class="computeroutput">BZ_OK</code>
Chris@4 2023 is returned. If the compressed data won't fit,
Chris@4 2024 <code class="computeroutput">*destLen</code> is unchanged, and
Chris@4 2025 <code class="computeroutput">BZ_OUTBUFF_FULL</code> is
Chris@4 2026 returned.</p>
Chris@4 2027 <p><code class="computeroutput">source</code> is assumed to hold
Chris@4 2028 a complete <code class="computeroutput">bzip2</code> format data
Chris@4 2029 stream.
Chris@4 2030 <code class="computeroutput">BZ2_bzBuffToBuffDecompress</code> tries
Chris@4 2031 to decompress the entirety of the stream into the output
Chris@4 2032 buffer.</p>
Chris@4 2033 <p>For the meaning of parameters
Chris@4 2034 <code class="computeroutput">small</code> and
Chris@4 2035 <code class="computeroutput">verbosity</code>, see
Chris@4 2036 <code class="computeroutput">BZ2_bzDecompressInit</code>.</p>
Chris@4 2037 <p>Because the compression ratio of the compressed data cannot
Chris@4 2038 be known in advance, there is no easy way to guarantee that the
Chris@4 2039 output buffer will be big enough. You may of course make
Chris@4 2040 arrangements in your code to record the size of the uncompressed
Chris@4 2041 data, but such a mechanism is beyond the scope of this
Chris@4 2042 library.</p>
Chris@4 2043 <p><code class="computeroutput">BZ2_bzBuffToBuffDecompress</code>
Chris@4 2044 will not write data at or beyond
Chris@4 2045 <code class="computeroutput">dest[*destLen]</code>, even in case of
Chris@4 2046 buffer overflow.</p>
Chris@4 2047 <p>Possible return values:</p>
Chris@4 2048 <pre class="programlisting">BZ_CONFIG_ERROR
Chris@4 2049 if the library has been mis-compiled
Chris@4 2050 BZ_PARAM_ERROR
Chris@4 2051 if dest is NULL or destLen is NULL
Chris@4 2052 or small != 0 &amp;&amp; small != 1
Chris@4 2053 or verbosity &lt; 0 or verbosity &gt; 4
Chris@4 2054 BZ_MEM_ERROR
Chris@4 2055 if insufficient memory is available
Chris@4 2056 BZ_OUTBUFF_FULL
Chris@4 2057 if the size of the compressed data exceeds *destLen
Chris@4 2058 BZ_DATA_ERROR
Chris@4 2059 if a data integrity error was detected in the compressed data
Chris@4 2060 BZ_DATA_ERROR_MAGIC
Chris@4 2061 if the compressed data doesn't begin with the right magic bytes
Chris@4 2062 BZ_UNEXPECTED_EOF
Chris@4 2063 if the compressed data ends unexpectedly
Chris@4 2064 BZ_OK
Chris@4 2065 otherwise</pre>
Chris@4 2066 </div>
Chris@4 2067 </div>
Chris@4 2068 <div class="sect1" title="3.6. zlib compatibility functions">
Chris@4 2069 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
Chris@4 2070 <a name="zlib-compat"></a>3.6. zlib compatibility functions</h2></div></div></div>
Chris@4 2071 <p>Yoshioka Tsuneo has contributed some functions to give
Chris@4 2072 better <code class="computeroutput">zlib</code> compatibility.
Chris@4 2073 These functions are <code class="computeroutput">BZ2_bzopen</code>,
Chris@4 2074 <code class="computeroutput">BZ2_bzread</code>,
Chris@4 2075 <code class="computeroutput">BZ2_bzwrite</code>,
Chris@4 2076 <code class="computeroutput">BZ2_bzflush</code>,
Chris@4 2077 <code class="computeroutput">BZ2_bzclose</code>,
Chris@4 2078 <code class="computeroutput">BZ2_bzerror</code> and
Chris@4 2079 <code class="computeroutput">BZ2_bzlibVersion</code>. These
Chris@4 2080 functions are not (yet) officially part of the library. If they
Chris@4 2081 break, you get to keep all the pieces. Nevertheless, I think
Chris@4 2082 they work ok.</p>
Chris@4 2083 <pre class="programlisting">typedef void BZFILE;
Chris@4 2084
Chris@4 2085 const char * BZ2_bzlibVersion ( void );</pre>
Chris@4 2086 <p>Returns a string indicating the library version.</p>
Chris@4 2087 <pre class="programlisting">BZFILE * BZ2_bzopen ( const char *path, const char *mode );
Chris@4 2088 BZFILE * BZ2_bzdopen ( int fd, const char *mode );</pre>
Chris@4 2089 <p>Opens a <code class="computeroutput">.bz2</code> file for
Chris@4 2090 reading or writing, using either its name or a pre-existing file
Chris@4 2091 descriptor. Analogous to <code class="computeroutput">fopen</code>
Chris@4 2092 and <code class="computeroutput">fdopen</code>.</p>
Chris@4 2093 <pre class="programlisting">int BZ2_bzread ( BZFILE* b, void* buf, int len );
Chris@4 2094 int BZ2_bzwrite ( BZFILE* b, void* buf, int len );</pre>
Chris@4 2095 <p>Reads/writes data from/to a previously opened
Chris@4 2096 <code class="computeroutput">BZFILE</code>. Analogous to
Chris@4 2097 <code class="computeroutput">fread</code> and
Chris@4 2098 <code class="computeroutput">fwrite</code>.</p>
Chris@4 2099 <pre class="programlisting">int BZ2_bzflush ( BZFILE* b );
Chris@4 2100 void BZ2_bzclose ( BZFILE* b );</pre>
Chris@4 2101 <p>Flushes/closes a <code class="computeroutput">BZFILE</code>.
Chris@4 2102 <code class="computeroutput">BZ2_bzflush</code> doesn't actually do
Chris@4 2103 anything. Analogous to <code class="computeroutput">fflush</code>
Chris@4 2104 and <code class="computeroutput">fclose</code>.</p>
Chris@4 2105 <pre class="programlisting">const char * BZ2_bzerror ( BZFILE *b, int *errnum )</pre>
Chris@4 2106 <p>Returns a string describing the more recent error status of
Chris@4 2107 <code class="computeroutput">b</code>, and also sets
Chris@4 2108 <code class="computeroutput">*errnum</code> to its numerical
Chris@4 2109 value.</p>
Chris@4 2110 </div>
Chris@4 2111 <div class="sect1" title="3.7. Using the library in a stdio-free environment">
Chris@4 2112 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
Chris@4 2113 <a name="stdio-free"></a>3.7. Using the library in a stdio-free environment</h2></div></div></div>
Chris@4 2114 <div class="sect2" title="3.7.1. Getting rid of stdio">
Chris@4 2115 <div class="titlepage"><div><div><h3 class="title">
Chris@4 2116 <a name="stdio-bye"></a>3.7.1. Getting rid of stdio</h3></div></div></div>
Chris@4 2117 <p>In a deeply embedded application, you might want to use
Chris@4 2118 just the memory-to-memory functions. You can do this
Chris@4 2119 conveniently by compiling the library with preprocessor symbol
Chris@4 2120 <code class="computeroutput">BZ_NO_STDIO</code> defined. Doing this
Chris@4 2121 gives you a library containing only the following eight
Chris@4 2122 functions:</p>
Chris@4 2123 <p><code class="computeroutput">BZ2_bzCompressInit</code>,
Chris@4 2124 <code class="computeroutput">BZ2_bzCompress</code>,
Chris@4 2125 <code class="computeroutput">BZ2_bzCompressEnd</code>
Chris@4 2126 <code class="computeroutput">BZ2_bzDecompressInit</code>,
Chris@4 2127 <code class="computeroutput">BZ2_bzDecompress</code>,
Chris@4 2128 <code class="computeroutput">BZ2_bzDecompressEnd</code>
Chris@4 2129 <code class="computeroutput">BZ2_bzBuffToBuffCompress</code>,
Chris@4 2130 <code class="computeroutput">BZ2_bzBuffToBuffDecompress</code></p>
Chris@4 2131 <p>When compiled like this, all functions will ignore
Chris@4 2132 <code class="computeroutput">verbosity</code> settings.</p>
Chris@4 2133 </div>
Chris@4 2134 <div class="sect2" title="3.7.2. Critical error handling">
Chris@4 2135 <div class="titlepage"><div><div><h3 class="title">
Chris@4 2136 <a name="critical-error"></a>3.7.2. Critical error handling</h3></div></div></div>
Chris@4 2137 <p><code class="computeroutput">libbzip2</code> contains a number
Chris@4 2138 of internal assertion checks which should, needless to say, never
Chris@4 2139 be activated. Nevertheless, if an assertion should fail,
Chris@4 2140 behaviour depends on whether or not the library was compiled with
Chris@4 2141 <code class="computeroutput">BZ_NO_STDIO</code> set.</p>
Chris@4 2142 <p>For a normal compile, an assertion failure yields the
Chris@4 2143 message:</p>
Chris@4 2144 <div class="blockquote"><blockquote class="blockquote">
Chris@4 2145 <p>bzip2/libbzip2: internal error number N.</p>
Chris@4 2146 <p>This is a bug in bzip2/libbzip2, 1.0.6 of 6 September 2010.
Chris@4 2147 Please report it to me at: jseward@bzip.org. If this happened
Chris@4 2148 when you were using some program which uses libbzip2 as a
Chris@4 2149 component, you should also report this bug to the author(s)
Chris@4 2150 of that program. Please make an effort to report this bug;
Chris@4 2151 timely and accurate bug reports eventually lead to higher
Chris@4 2152 quality software. Thanks. Julian Seward, 6 September 2010.
Chris@4 2153 </p>
Chris@4 2154 </blockquote></div>
Chris@4 2155 <p>where <code class="computeroutput">N</code> is some error code
Chris@4 2156 number. If <code class="computeroutput">N == 1007</code>, it also
Chris@4 2157 prints some extra text advising the reader that unreliable memory
Chris@4 2158 is often associated with internal error 1007. (This is a
Chris@4 2159 frequently-observed-phenomenon with versions 1.0.0/1.0.1).</p>
Chris@4 2160 <p><code class="computeroutput">exit(3)</code> is then
Chris@4 2161 called.</p>
Chris@4 2162 <p>For a <code class="computeroutput">stdio</code>-free library,
Chris@4 2163 assertion failures result in a call to a function declared
Chris@4 2164 as:</p>
Chris@4 2165 <pre class="programlisting">extern void bz_internal_error ( int errcode );</pre>
Chris@4 2166 <p>The relevant code is passed as a parameter. You should
Chris@4 2167 supply such a function.</p>
Chris@4 2168 <p>In either case, once an assertion failure has occurred, any
Chris@4 2169 <code class="computeroutput">bz_stream</code> records involved can
Chris@4 2170 be regarded as invalid. You should not attempt to resume normal
Chris@4 2171 operation with them.</p>
Chris@4 2172 <p>You may, of course, change critical error handling to suit
Chris@4 2173 your needs. As I said above, critical errors indicate bugs in
Chris@4 2174 the library and should not occur. All "normal" error situations
Chris@4 2175 are indicated via error return codes from functions, and can be
Chris@4 2176 recovered from.</p>
Chris@4 2177 </div>
Chris@4 2178 </div>
Chris@4 2179 <div class="sect1" title="3.8. Making a Windows DLL">
Chris@4 2180 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
Chris@4 2181 <a name="win-dll"></a>3.8. Making a Windows DLL</h2></div></div></div>
Chris@4 2182 <p>Everything related to Windows has been contributed by
Chris@4 2183 Yoshioka Tsuneo
Chris@4 2184 (<code class="computeroutput">tsuneo@rr.iij4u.or.jp</code>), so
Chris@4 2185 you should send your queries to him (but perhaps Cc: me,
Chris@4 2186 <code class="computeroutput">jseward@bzip.org</code>).</p>
Chris@4 2187 <p>My vague understanding of what to do is: using Visual C++
Chris@4 2188 5.0, open the project file
Chris@4 2189 <code class="computeroutput">libbz2.dsp</code>, and build. That's
Chris@4 2190 all.</p>
Chris@4 2191 <p>If you can't open the project file for some reason, make a
Chris@4 2192 new one, naming these files:
Chris@4 2193 <code class="computeroutput">blocksort.c</code>,
Chris@4 2194 <code class="computeroutput">bzlib.c</code>,
Chris@4 2195 <code class="computeroutput">compress.c</code>,
Chris@4 2196 <code class="computeroutput">crctable.c</code>,
Chris@4 2197 <code class="computeroutput">decompress.c</code>,
Chris@4 2198 <code class="computeroutput">huffman.c</code>,
Chris@4 2199 <code class="computeroutput">randtable.c</code> and
Chris@4 2200 <code class="computeroutput">libbz2.def</code>. You will also need
Chris@4 2201 to name the header files <code class="computeroutput">bzlib.h</code>
Chris@4 2202 and <code class="computeroutput">bzlib_private.h</code>.</p>
Chris@4 2203 <p>If you don't use VC++, you may need to define the
Chris@4 2204 proprocessor symbol
Chris@4 2205 <code class="computeroutput">_WIN32</code>.</p>
Chris@4 2206 <p>Finally, <code class="computeroutput">dlltest.c</code> is a
Chris@4 2207 sample program using the DLL. It has a project file,
Chris@4 2208 <code class="computeroutput">dlltest.dsp</code>.</p>
Chris@4 2209 <p>If you just want a makefile for Visual C, have a look at
Chris@4 2210 <code class="computeroutput">makefile.msc</code>.</p>
Chris@4 2211 <p>Be aware that if you compile
Chris@4 2212 <code class="computeroutput">bzip2</code> itself on Win32, you must
Chris@4 2213 set <code class="computeroutput">BZ_UNIX</code> to 0 and
Chris@4 2214 <code class="computeroutput">BZ_LCCWIN32</code> to 1, in the file
Chris@4 2215 <code class="computeroutput">bzip2.c</code>, before compiling.
Chris@4 2216 Otherwise the resulting binary won't work correctly.</p>
Chris@4 2217 <p>I haven't tried any of this stuff myself, but it all looks
Chris@4 2218 plausible.</p>
Chris@4 2219 </div>
Chris@4 2220 </div>
Chris@4 2221 <div class="chapter" title="4. Miscellanea">
Chris@4 2222 <div class="titlepage"><div><div><h2 class="title">
Chris@4 2223 <a name="misc"></a>4. Miscellanea</h2></div></div></div>
Chris@4 2224 <div class="toc">
Chris@4 2225 <p><b>Table of Contents</b></p>
Chris@4 2226 <dl>
Chris@4 2227 <dt><span class="sect1"><a href="#limits">4.1. Limitations of the compressed file format</a></span></dt>
Chris@4 2228 <dt><span class="sect1"><a href="#port-issues">4.2. Portability issues</a></span></dt>
Chris@4 2229 <dt><span class="sect1"><a href="#bugs">4.3. Reporting bugs</a></span></dt>
Chris@4 2230 <dt><span class="sect1"><a href="#package">4.4. Did you get the right package?</a></span></dt>
Chris@4 2231 <dt><span class="sect1"><a href="#reading">4.5. Further Reading</a></span></dt>
Chris@4 2232 </dl>
Chris@4 2233 </div>
Chris@4 2234 <p>These are just some random thoughts of mine. Your mileage
Chris@4 2235 may vary.</p>
Chris@4 2236 <div class="sect1" title="4.1. Limitations of the compressed file format">
Chris@4 2237 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
Chris@4 2238 <a name="limits"></a>4.1. Limitations of the compressed file format</h2></div></div></div>
Chris@4 2239 <p><code class="computeroutput">bzip2-1.0.X</code>,
Chris@4 2240 <code class="computeroutput">0.9.5</code> and
Chris@4 2241 <code class="computeroutput">0.9.0</code> use exactly the same file
Chris@4 2242 format as the original version,
Chris@4 2243 <code class="computeroutput">bzip2-0.1</code>. This decision was
Chris@4 2244 made in the interests of stability. Creating yet another
Chris@4 2245 incompatible compressed file format would create further
Chris@4 2246 confusion and disruption for users.</p>
Chris@4 2247 <p>Nevertheless, this is not a painless decision. Development
Chris@4 2248 work since the release of
Chris@4 2249 <code class="computeroutput">bzip2-0.1</code> in August 1997 has
Chris@4 2250 shown complexities in the file format which slow down
Chris@4 2251 decompression and, in retrospect, are unnecessary. These
Chris@4 2252 are:</p>
Chris@4 2253 <div class="itemizedlist"><ul class="itemizedlist" type="bullet">
Chris@4 2254 <li class="listitem" style="list-style-type: disc"><p>The run-length encoder, which is the first of the
Chris@4 2255 compression transformations, is entirely irrelevant. The
Chris@4 2256 original purpose was to protect the sorting algorithm from the
Chris@4 2257 very worst case input: a string of repeated symbols. But
Chris@4 2258 algorithm steps Q6a and Q6b in the original Burrows-Wheeler
Chris@4 2259 technical report (SRC-124) show how repeats can be handled
Chris@4 2260 without difficulty in block sorting.</p></li>
Chris@4 2261 <li class="listitem" style="list-style-type: disc">
Chris@4 2262 <p>The randomisation mechanism doesn't really need to be
Chris@4 2263 there. Udi Manber and Gene Myers published a suffix array
Chris@4 2264 construction algorithm a few years back, which can be employed
Chris@4 2265 to sort any block, no matter how repetitive, in O(N log N)
Chris@4 2266 time. Subsequent work by Kunihiko Sadakane has produced a
Chris@4 2267 derivative O(N (log N)^2) algorithm which usually outperforms
Chris@4 2268 the Manber-Myers algorithm.</p>
Chris@4 2269 <p>I could have changed to Sadakane's algorithm, but I find
Chris@4 2270 it to be slower than <code class="computeroutput">bzip2</code>'s
Chris@4 2271 existing algorithm for most inputs, and the randomisation
Chris@4 2272 mechanism protects adequately against bad cases. I didn't
Chris@4 2273 think it was a good tradeoff to make. Partly this is due to
Chris@4 2274 the fact that I was not flooded with email complaints about
Chris@4 2275 <code class="computeroutput">bzip2-0.1</code>'s performance on
Chris@4 2276 repetitive data, so perhaps it isn't a problem for real
Chris@4 2277 inputs.</p>
Chris@4 2278 <p>Probably the best long-term solution, and the one I have
Chris@4 2279 incorporated into 0.9.5 and above, is to use the existing
Chris@4 2280 sorting algorithm initially, and fall back to a O(N (log N)^2)
Chris@4 2281 algorithm if the standard algorithm gets into
Chris@4 2282 difficulties.</p>
Chris@4 2283 </li>
Chris@4 2284 <li class="listitem" style="list-style-type: disc"><p>The compressed file format was never designed to be
Chris@4 2285 handled by a library, and I have had to jump though some hoops
Chris@4 2286 to produce an efficient implementation of decompression. It's
Chris@4 2287 a bit hairy. Try passing
Chris@4 2288 <code class="computeroutput">decompress.c</code> through the C
Chris@4 2289 preprocessor and you'll see what I mean. Much of this
Chris@4 2290 complexity could have been avoided if the compressed size of
Chris@4 2291 each block of data was recorded in the data stream.</p></li>
Chris@4 2292 <li class="listitem" style="list-style-type: disc"><p>An Adler-32 checksum, rather than a CRC32 checksum,
Chris@4 2293 would be faster to compute.</p></li>
Chris@4 2294 </ul></div>
Chris@4 2295 <p>It would be fair to say that the
Chris@4 2296 <code class="computeroutput">bzip2</code> format was frozen before I
Chris@4 2297 properly and fully understood the performance consequences of
Chris@4 2298 doing so.</p>
Chris@4 2299 <p>Improvements which I was able to incorporate into 0.9.0,
Chris@4 2300 despite using the same file format, are:</p>
Chris@4 2301 <div class="itemizedlist"><ul class="itemizedlist" type="bullet">
Chris@4 2302 <li class="listitem" style="list-style-type: disc"><p>Single array implementation of the inverse BWT. This
Chris@4 2303 significantly speeds up decompression, presumably because it
Chris@4 2304 reduces the number of cache misses.</p></li>
Chris@4 2305 <li class="listitem" style="list-style-type: disc"><p>Faster inverse MTF transform for large MTF values.
Chris@4 2306 The new implementation is based on the notion of sliding blocks
Chris@4 2307 of values.</p></li>
Chris@4 2308 <li class="listitem" style="list-style-type: disc"><p><code class="computeroutput">bzip2-0.9.0</code> now reads
Chris@4 2309 and writes files with <code class="computeroutput">fread</code>
Chris@4 2310 and <code class="computeroutput">fwrite</code>; version 0.1 used
Chris@4 2311 <code class="computeroutput">putc</code> and
Chris@4 2312 <code class="computeroutput">getc</code>. Duh! Well, you live
Chris@4 2313 and learn.</p></li>
Chris@4 2314 </ul></div>
Chris@4 2315 <p>Further ahead, it would be nice to be able to do random
Chris@4 2316 access into files. This will require some careful design of
Chris@4 2317 compressed file formats.</p>
Chris@4 2318 </div>
Chris@4 2319 <div class="sect1" title="4.2. Portability issues">
Chris@4 2320 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
Chris@4 2321 <a name="port-issues"></a>4.2. Portability issues</h2></div></div></div>
Chris@4 2322 <p>After some consideration, I have decided not to use GNU
Chris@4 2323 <code class="computeroutput">autoconf</code> to configure 0.9.5 or
Chris@4 2324 1.0.</p>
Chris@4 2325 <p><code class="computeroutput">autoconf</code>, admirable and
Chris@4 2326 wonderful though it is, mainly assists with portability problems
Chris@4 2327 between Unix-like platforms. But
Chris@4 2328 <code class="computeroutput">bzip2</code> doesn't have much in the
Chris@4 2329 way of portability problems on Unix; most of the difficulties
Chris@4 2330 appear when porting to the Mac, or to Microsoft's operating
Chris@4 2331 systems. <code class="computeroutput">autoconf</code> doesn't help
Chris@4 2332 in those cases, and brings in a whole load of new
Chris@4 2333 complexity.</p>
Chris@4 2334 <p>Most people should be able to compile the library and
Chris@4 2335 program under Unix straight out-of-the-box, so to speak,
Chris@4 2336 especially if you have a version of GNU C available.</p>
Chris@4 2337 <p>There are a couple of
Chris@4 2338 <code class="computeroutput">__inline__</code> directives in the
Chris@4 2339 code. GNU C (<code class="computeroutput">gcc</code>) should be
Chris@4 2340 able to handle them. If you're not using GNU C, your C compiler
Chris@4 2341 shouldn't see them at all. If your compiler does, for some
Chris@4 2342 reason, see them and doesn't like them, just
Chris@4 2343 <code class="computeroutput">#define</code>
Chris@4 2344 <code class="computeroutput">__inline__</code> to be
Chris@4 2345 <code class="computeroutput">/* */</code>. One easy way to do this
Chris@4 2346 is to compile with the flag
Chris@4 2347 <code class="computeroutput">-D__inline__=</code>, which should be
Chris@4 2348 understood by most Unix compilers.</p>
Chris@4 2349 <p>If you still have difficulties, try compiling with the
Chris@4 2350 macro <code class="computeroutput">BZ_STRICT_ANSI</code> defined.
Chris@4 2351 This should enable you to build the library in a strictly ANSI
Chris@4 2352 compliant environment. Building the program itself like this is
Chris@4 2353 dangerous and not supported, since you remove
Chris@4 2354 <code class="computeroutput">bzip2</code>'s checks against
Chris@4 2355 compressing directories, symbolic links, devices, and other
Chris@4 2356 not-really-a-file entities. This could cause filesystem
Chris@4 2357 corruption!</p>
Chris@4 2358 <p>One other thing: if you create a
Chris@4 2359 <code class="computeroutput">bzip2</code> binary for public distribution,
Chris@4 2360 please consider linking it statically (<code class="computeroutput">gcc
Chris@4 2361 -static</code>). This avoids all sorts of library-version
Chris@4 2362 issues that others may encounter later on.</p>
Chris@4 2363 <p>If you build <code class="computeroutput">bzip2</code> on
Chris@4 2364 Win32, you must set <code class="computeroutput">BZ_UNIX</code> to 0
Chris@4 2365 and <code class="computeroutput">BZ_LCCWIN32</code> to 1, in the
Chris@4 2366 file <code class="computeroutput">bzip2.c</code>, before compiling.
Chris@4 2367 Otherwise the resulting binary won't work correctly.</p>
Chris@4 2368 </div>
Chris@4 2369 <div class="sect1" title="4.3. Reporting bugs">
Chris@4 2370 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
Chris@4 2371 <a name="bugs"></a>4.3. Reporting bugs</h2></div></div></div>
Chris@4 2372 <p>I tried pretty hard to make sure
Chris@4 2373 <code class="computeroutput">bzip2</code> is bug free, both by
Chris@4 2374 design and by testing. Hopefully you'll never need to read this
Chris@4 2375 section for real.</p>
Chris@4 2376 <p>Nevertheless, if <code class="computeroutput">bzip2</code> dies
Chris@4 2377 with a segmentation fault, a bus error or an internal assertion
Chris@4 2378 failure, it will ask you to email me a bug report. Experience from
Chris@4 2379 years of feedback of bzip2 users indicates that almost all these
Chris@4 2380 problems can be traced to either compiler bugs or hardware
Chris@4 2381 problems.</p>
Chris@4 2382 <div class="itemizedlist"><ul class="itemizedlist" type="bullet">
Chris@4 2383 <li class="listitem" style="list-style-type: disc">
Chris@4 2384 <p>Recompile the program with no optimisation, and
Chris@4 2385 see if it works. And/or try a different compiler. I heard all
Chris@4 2386 sorts of stories about various flavours of GNU C (and other
Chris@4 2387 compilers) generating bad code for
Chris@4 2388 <code class="computeroutput">bzip2</code>, and I've run across two
Chris@4 2389 such examples myself.</p>
Chris@4 2390 <p>2.7.X versions of GNU C are known to generate bad code
Chris@4 2391 from time to time, at high optimisation levels. If you get
Chris@4 2392 problems, try using the flags
Chris@4 2393 <code class="computeroutput">-O2</code>
Chris@4 2394 <code class="computeroutput">-fomit-frame-pointer</code>
Chris@4 2395 <code class="computeroutput">-fno-strength-reduce</code>. You
Chris@4 2396 should specifically <span class="emphasis"><em>not</em></span> use
Chris@4 2397 <code class="computeroutput">-funroll-loops</code>.</p>
Chris@4 2398 <p>You may notice that the Makefile runs six tests as part
Chris@4 2399 of the build process. If the program passes all of these, it's
Chris@4 2400 a pretty good (but not 100%) indication that the compiler has
Chris@4 2401 done its job correctly.</p>
Chris@4 2402 </li>
Chris@4 2403 <li class="listitem" style="list-style-type: disc">
Chris@4 2404 <p>If <code class="computeroutput">bzip2</code>
Chris@4 2405 crashes randomly, and the crashes are not repeatable, you may
Chris@4 2406 have a flaky memory subsystem.
Chris@4 2407 <code class="computeroutput">bzip2</code> really hammers your
Chris@4 2408 memory hierarchy, and if it's a bit marginal, you may get these
Chris@4 2409 problems. Ditto if your disk or I/O subsystem is slowly
Chris@4 2410 failing. Yup, this really does happen.</p>
Chris@4 2411 <p>Try using a different machine of the same type, and see
Chris@4 2412 if you can repeat the problem.</p>
Chris@4 2413 </li>
Chris@4 2414 <li class="listitem" style="list-style-type: disc"><p>This isn't really a bug, but ... If
Chris@4 2415 <code class="computeroutput">bzip2</code> tells you your file is
Chris@4 2416 corrupted on decompression, and you obtained the file via FTP,
Chris@4 2417 there is a possibility that you forgot to tell FTP to do a
Chris@4 2418 binary mode transfer. That absolutely will cause the file to
Chris@4 2419 be non-decompressible. You'll have to transfer it
Chris@4 2420 again.</p></li>
Chris@4 2421 </ul></div>
Chris@4 2422 <p>If you've incorporated
Chris@4 2423 <code class="computeroutput">libbzip2</code> into your own program
Chris@4 2424 and are getting problems, please, please, please, check that the
Chris@4 2425 parameters you are passing in calls to the library, are correct,
Chris@4 2426 and in accordance with what the documentation says is allowable.
Chris@4 2427 I have tried to make the library robust against such problems,
Chris@4 2428 but I'm sure I haven't succeeded.</p>
Chris@4 2429 <p>Finally, if the above comments don't help, you'll have to
Chris@4 2430 send me a bug report. Now, it's just amazing how many people
Chris@4 2431 will send me a bug report saying something like:</p>
Chris@4 2432 <pre class="programlisting">bzip2 crashed with segmentation fault on my machine</pre>
Chris@4 2433 <p>and absolutely nothing else. Needless to say, a such a
Chris@4 2434 report is <span class="emphasis"><em>totally, utterly, completely and
Chris@4 2435 comprehensively 100% useless; a waste of your time, my time, and
Chris@4 2436 net bandwidth</em></span>. With no details at all, there's no way
Chris@4 2437 I can possibly begin to figure out what the problem is.</p>
Chris@4 2438 <p>The rules of the game are: facts, facts, facts. Don't omit
Chris@4 2439 them because "oh, they won't be relevant". At the bare
Chris@4 2440 minimum:</p>
Chris@4 2441 <pre class="programlisting">Machine type. Operating system version.
Chris@4 2442 Exact version of bzip2 (do bzip2 -V).
Chris@4 2443 Exact version of the compiler used.
Chris@4 2444 Flags passed to the compiler.</pre>
Chris@4 2445 <p>However, the most important single thing that will help me
Chris@4 2446 is the file that you were trying to compress or decompress at the
Chris@4 2447 time the problem happened. Without that, my ability to do
Chris@4 2448 anything more than speculate about the cause, is limited.</p>
Chris@4 2449 </div>
Chris@4 2450 <div class="sect1" title="4.4. Did you get the right package?">
Chris@4 2451 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
Chris@4 2452 <a name="package"></a>4.4. Did you get the right package?</h2></div></div></div>
Chris@4 2453 <p><code class="computeroutput">bzip2</code> is a resource hog.
Chris@4 2454 It soaks up large amounts of CPU cycles and memory. Also, it
Chris@4 2455 gives very large latencies. In the worst case, you can feed many
Chris@4 2456 megabytes of uncompressed data into the library before getting
Chris@4 2457 any compressed output, so this probably rules out applications
Chris@4 2458 requiring interactive behaviour.</p>
Chris@4 2459 <p>These aren't faults of my implementation, I hope, but more
Chris@4 2460 an intrinsic property of the Burrows-Wheeler transform
Chris@4 2461 (unfortunately). Maybe this isn't what you want.</p>
Chris@4 2462 <p>If you want a compressor and/or library which is faster,
Chris@4 2463 uses less memory but gets pretty good compression, and has
Chris@4 2464 minimal latency, consider Jean-loup Gailly's and Mark Adler's
Chris@4 2465 work, <code class="computeroutput">zlib-1.2.1</code> and
Chris@4 2466 <code class="computeroutput">gzip-1.2.4</code>. Look for them at
Chris@4 2467 <a class="ulink" href="http://www.zlib.org" target="_top">http://www.zlib.org</a> and
Chris@4 2468 <a class="ulink" href="http://www.gzip.org" target="_top">http://www.gzip.org</a>
Chris@4 2469 respectively.</p>
Chris@4 2470 <p>For something faster and lighter still, you might try Markus F
Chris@4 2471 X J Oberhumer's <code class="computeroutput">LZO</code> real-time
Chris@4 2472 compression/decompression library, at
Chris@4 2473 <a class="ulink" href="http://www.oberhumer.com/opensource" target="_top">http://www.oberhumer.com/opensource</a>.</p>
Chris@4 2474 </div>
Chris@4 2475 <div class="sect1" title="4.5. Further Reading">
Chris@4 2476 <div class="titlepage"><div><div><h2 class="title" style="clear: both">
Chris@4 2477 <a name="reading"></a>4.5. Further Reading</h2></div></div></div>
Chris@4 2478 <p><code class="computeroutput">bzip2</code> is not research
Chris@4 2479 work, in the sense that it doesn't present any new ideas.
Chris@4 2480 Rather, it's an engineering exercise based on existing
Chris@4 2481 ideas.</p>
Chris@4 2482 <p>Four documents describe essentially all the ideas behind
Chris@4 2483 <code class="computeroutput">bzip2</code>:</p>
Chris@4 2484 <div class="literallayout"><p>Michael Burrows and D. J. Wheeler:<br>
Chris@4 2485   "A block-sorting lossless data compression algorithm"<br>
Chris@4 2486    10th May 1994. <br>
Chris@4 2487    Digital SRC Research Report 124.<br>
Chris@4 2488    ftp://ftp.digital.com/pub/DEC/SRC/research-reports/SRC-124.ps.gz<br>
Chris@4 2489    If you have trouble finding it, try searching at the<br>
Chris@4 2490    New Zealand Digital Library, http://www.nzdl.org.<br>
Chris@4 2491 <br>
Chris@4 2492 Daniel S. Hirschberg and Debra A. LeLewer<br>
Chris@4 2493   "Efficient Decoding of Prefix Codes"<br>
Chris@4 2494    Communications of the ACM, April 1990, Vol 33, Number 4.<br>
Chris@4 2495    You might be able to get an electronic copy of this<br>
Chris@4 2496    from the ACM Digital Library.<br>
Chris@4 2497 <br>
Chris@4 2498 David J. Wheeler<br>
Chris@4 2499    Program bred3.c and accompanying document bred3.ps.<br>
Chris@4 2500    This contains the idea behind the multi-table Huffman coding scheme.<br>
Chris@4 2501    ftp://ftp.cl.cam.ac.uk/users/djw3/<br>
Chris@4 2502 <br>
Chris@4 2503 Jon L. Bentley and Robert Sedgewick<br>
Chris@4 2504   "Fast Algorithms for Sorting and Searching Strings"<br>
Chris@4 2505    Available from Sedgewick's web page,<br>
Chris@4 2506    www.cs.princeton.edu/~rs<br>
Chris@4 2507 </p></div>
Chris@4 2508 <p>The following paper gives valuable additional insights into
Chris@4 2509 the algorithm, but is not immediately the basis of any code used
Chris@4 2510 in bzip2.</p>
Chris@4 2511 <div class="literallayout"><p>Peter Fenwick:<br>
Chris@4 2512    Block Sorting Text Compression<br>
Chris@4 2513    Proceedings of the 19th Australasian Computer Science Conference,<br>
Chris@4 2514      Melbourne, Australia.  Jan 31 - Feb 2, 1996.<br>
Chris@4 2515    ftp://ftp.cs.auckland.ac.nz/pub/peter-f/ACSC96paper.ps</p></div>
Chris@4 2516 <p>Kunihiko Sadakane's sorting algorithm, mentioned above, is
Chris@4 2517 available from:</p>
Chris@4 2518 <div class="literallayout"><p>http://naomi.is.s.u-tokyo.ac.jp/~sada/papers/Sada98b.ps.gz<br>
Chris@4 2519 </p></div>
Chris@4 2520 <p>The Manber-Myers suffix array construction algorithm is
Chris@4 2521 described in a paper available from:</p>
Chris@4 2522 <div class="literallayout"><p>http://www.cs.arizona.edu/people/gene/PAPERS/suffix.ps<br>
Chris@4 2523 </p></div>
Chris@4 2524 <p>Finally, the following papers document some
Chris@4 2525 investigations I made into the performance of sorting
Chris@4 2526 and decompression algorithms:</p>
Chris@4 2527 <div class="literallayout"><p>Julian Seward<br>
Chris@4 2528    On the Performance of BWT Sorting Algorithms<br>
Chris@4 2529    Proceedings of the IEEE Data Compression Conference 2000<br>
Chris@4 2530      Snowbird, Utah.  28-30 March 2000.<br>
Chris@4 2531 <br>
Chris@4 2532 Julian Seward<br>
Chris@4 2533    Space-time Tradeoffs in the Inverse B-W Transform<br>
Chris@4 2534    Proceedings of the IEEE Data Compression Conference 2001<br>
Chris@4 2535      Snowbird, Utah.  27-29 March 2001.<br>
Chris@4 2536 </p></div>
Chris@4 2537 </div>
Chris@4 2538 </div>
Chris@4 2539 </div></body>
Chris@4 2540 </html>