cannam@89: cannam@89: cannam@89: cannam@89: cannam@89: cannam@89: cannam@89: Network Working Group P. Deutsch cannam@89: Request for Comments: 1952 Aladdin Enterprises cannam@89: Category: Informational May 1996 cannam@89: cannam@89: cannam@89: GZIP file format specification version 4.3 cannam@89: cannam@89: Status of This Memo cannam@89: cannam@89: This memo provides information for the Internet community. This memo cannam@89: does not specify an Internet standard of any kind. Distribution of cannam@89: this memo is unlimited. cannam@89: cannam@89: IESG Note: cannam@89: cannam@89: The IESG takes no position on the validity of any Intellectual cannam@89: Property Rights statements contained in this document. cannam@89: cannam@89: Notices cannam@89: cannam@89: Copyright (c) 1996 L. Peter Deutsch cannam@89: cannam@89: Permission is granted to copy and distribute this document for any cannam@89: purpose and without charge, including translations into other cannam@89: languages and incorporation into compilations, provided that the cannam@89: copyright notice and this notice are preserved, and that any cannam@89: substantive changes or deletions from the original are clearly cannam@89: marked. cannam@89: cannam@89: A pointer to the latest version of this and related documentation in cannam@89: HTML format can be found at the URL cannam@89: . cannam@89: cannam@89: Abstract cannam@89: cannam@89: This specification defines a lossless compressed data format that is cannam@89: compatible with the widely used GZIP utility. The format includes a cannam@89: cyclic redundancy check value for detecting data corruption. The cannam@89: format presently uses the DEFLATE method of compression but can be cannam@89: easily extended to use other compression methods. The format can be cannam@89: implemented readily in a manner not covered by patents. cannam@89: cannam@89: cannam@89: cannam@89: cannam@89: cannam@89: cannam@89: cannam@89: cannam@89: cannam@89: cannam@89: Deutsch Informational [Page 1] cannam@89: cannam@89: RFC 1952 GZIP File Format Specification May 1996 cannam@89: cannam@89: cannam@89: Table of Contents cannam@89: cannam@89: 1. Introduction ................................................... 2 cannam@89: 1.1. Purpose ................................................... 2 cannam@89: 1.2. Intended audience ......................................... 3 cannam@89: 1.3. Scope ..................................................... 3 cannam@89: 1.4. Compliance ................................................ 3 cannam@89: 1.5. Definitions of terms and conventions used ................. 3 cannam@89: 1.6. Changes from previous versions ............................ 3 cannam@89: 2. Detailed specification ......................................... 4 cannam@89: 2.1. Overall conventions ....................................... 4 cannam@89: 2.2. File format ............................................... 5 cannam@89: 2.3. Member format ............................................. 5 cannam@89: 2.3.1. Member header and trailer ........................... 6 cannam@89: 2.3.1.1. Extra field ................................... 8 cannam@89: 2.3.1.2. Compliance .................................... 9 cannam@89: 3. References .................................................. 9 cannam@89: 4. Security Considerations .................................... 10 cannam@89: 5. Acknowledgements ........................................... 10 cannam@89: 6. Author's Address ........................................... 10 cannam@89: 7. Appendix: Jean-Loup Gailly's gzip utility .................. 11 cannam@89: 8. Appendix: Sample CRC Code .................................. 11 cannam@89: cannam@89: 1. Introduction cannam@89: cannam@89: 1.1. Purpose cannam@89: cannam@89: The purpose of this specification is to define a lossless cannam@89: compressed data format that: cannam@89: cannam@89: * Is independent of CPU type, operating system, file system, cannam@89: and character set, and hence can be used for interchange; cannam@89: * Can compress or decompress a data stream (as opposed to a cannam@89: randomly accessible file) to produce another data stream, cannam@89: using only an a priori bounded amount of intermediate cannam@89: storage, and hence can be used in data communications or cannam@89: similar structures such as Unix filters; cannam@89: * Compresses data with efficiency comparable to the best cannam@89: currently available general-purpose compression methods, cannam@89: and in particular considerably better than the "compress" cannam@89: program; cannam@89: * Can be implemented readily in a manner not covered by cannam@89: patents, and hence can be practiced freely; cannam@89: * Is compatible with the file format produced by the current cannam@89: widely used gzip utility, in that conforming decompressors cannam@89: will be able to read data produced by the existing gzip cannam@89: compressor. cannam@89: cannam@89: cannam@89: cannam@89: cannam@89: Deutsch Informational [Page 2] cannam@89: cannam@89: RFC 1952 GZIP File Format Specification May 1996 cannam@89: cannam@89: cannam@89: The data format defined by this specification does not attempt to: cannam@89: cannam@89: * Provide random access to compressed data; cannam@89: * Compress specialized data (e.g., raster graphics) as well as cannam@89: the best currently available specialized algorithms. cannam@89: cannam@89: 1.2. Intended audience cannam@89: cannam@89: This specification is intended for use by implementors of software cannam@89: to compress data into gzip format and/or decompress data from gzip cannam@89: format. cannam@89: cannam@89: The text of the specification assumes a basic background in cannam@89: programming at the level of bits and other primitive data cannam@89: representations. cannam@89: cannam@89: 1.3. Scope cannam@89: cannam@89: The specification specifies a compression method and a file format cannam@89: (the latter assuming only that a file can store a sequence of cannam@89: arbitrary bytes). It does not specify any particular interface to cannam@89: a file system or anything about character sets or encodings cannam@89: (except for file names and comments, which are optional). cannam@89: cannam@89: 1.4. Compliance cannam@89: cannam@89: Unless otherwise indicated below, a compliant decompressor must be cannam@89: able to accept and decompress any file that conforms to all the cannam@89: specifications presented here; a compliant compressor must produce cannam@89: files that conform to all the specifications presented here. The cannam@89: material in the appendices is not part of the specification per se cannam@89: and is not relevant to compliance. cannam@89: cannam@89: 1.5. Definitions of terms and conventions used cannam@89: cannam@89: byte: 8 bits stored or transmitted as a unit (same as an octet). cannam@89: (For this specification, a byte is exactly 8 bits, even on cannam@89: machines which store a character on a number of bits different cannam@89: from 8.) See below for the numbering of bits within a byte. cannam@89: cannam@89: 1.6. Changes from previous versions cannam@89: cannam@89: There have been no technical changes to the gzip format since cannam@89: version 4.1 of this specification. In version 4.2, some cannam@89: terminology was changed, and the sample CRC code was rewritten for cannam@89: clarity and to eliminate the requirement for the caller to do pre- cannam@89: and post-conditioning. Version 4.3 is a conversion of the cannam@89: specification to RFC style. cannam@89: cannam@89: cannam@89: cannam@89: Deutsch Informational [Page 3] cannam@89: cannam@89: RFC 1952 GZIP File Format Specification May 1996 cannam@89: cannam@89: cannam@89: 2. Detailed specification cannam@89: cannam@89: 2.1. Overall conventions cannam@89: cannam@89: In the diagrams below, a box like this: cannam@89: cannam@89: +---+ cannam@89: | | <-- the vertical bars might be missing cannam@89: +---+ cannam@89: cannam@89: represents one byte; a box like this: cannam@89: cannam@89: +==============+ cannam@89: | | cannam@89: +==============+ cannam@89: cannam@89: represents a variable number of bytes. cannam@89: cannam@89: Bytes stored within a computer do not have a "bit order", since cannam@89: they are always treated as a unit. However, a byte considered as cannam@89: an integer between 0 and 255 does have a most- and least- cannam@89: significant bit, and since we write numbers with the most- cannam@89: significant digit on the left, we also write bytes with the most- cannam@89: significant bit on the left. In the diagrams below, we number the cannam@89: bits of a byte so that bit 0 is the least-significant bit, i.e., cannam@89: the bits are numbered: cannam@89: cannam@89: +--------+ cannam@89: |76543210| cannam@89: +--------+ cannam@89: cannam@89: This document does not address the issue of the order in which cannam@89: bits of a byte are transmitted on a bit-sequential medium, since cannam@89: the data format described here is byte- rather than bit-oriented. cannam@89: cannam@89: Within a computer, a number may occupy multiple bytes. All cannam@89: multi-byte numbers in the format described here are stored with cannam@89: the least-significant byte first (at the lower memory address). cannam@89: For example, the decimal number 520 is stored as: cannam@89: cannam@89: 0 1 cannam@89: +--------+--------+ cannam@89: |00001000|00000010| cannam@89: +--------+--------+ cannam@89: ^ ^ cannam@89: | | cannam@89: | + more significant byte = 2 x 256 cannam@89: + less significant byte = 8 cannam@89: cannam@89: cannam@89: cannam@89: Deutsch Informational [Page 4] cannam@89: cannam@89: RFC 1952 GZIP File Format Specification May 1996 cannam@89: cannam@89: cannam@89: 2.2. File format cannam@89: cannam@89: A gzip file consists of a series of "members" (compressed data cannam@89: sets). The format of each member is specified in the following cannam@89: section. The members simply appear one after another in the file, cannam@89: with no additional information before, between, or after them. cannam@89: cannam@89: 2.3. Member format cannam@89: cannam@89: Each member has the following structure: cannam@89: cannam@89: +---+---+---+---+---+---+---+---+---+---+ cannam@89: |ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->) cannam@89: +---+---+---+---+---+---+---+---+---+---+ cannam@89: cannam@89: (if FLG.FEXTRA set) cannam@89: cannam@89: +---+---+=================================+ cannam@89: | XLEN |...XLEN bytes of "extra field"...| (more-->) cannam@89: +---+---+=================================+ cannam@89: cannam@89: (if FLG.FNAME set) cannam@89: cannam@89: +=========================================+ cannam@89: |...original file name, zero-terminated...| (more-->) cannam@89: +=========================================+ cannam@89: cannam@89: (if FLG.FCOMMENT set) cannam@89: cannam@89: +===================================+ cannam@89: |...file comment, zero-terminated...| (more-->) cannam@89: +===================================+ cannam@89: cannam@89: (if FLG.FHCRC set) cannam@89: cannam@89: +---+---+ cannam@89: | CRC16 | cannam@89: +---+---+ cannam@89: cannam@89: +=======================+ cannam@89: |...compressed blocks...| (more-->) cannam@89: +=======================+ cannam@89: cannam@89: 0 1 2 3 4 5 6 7 cannam@89: +---+---+---+---+---+---+---+---+ cannam@89: | CRC32 | ISIZE | cannam@89: +---+---+---+---+---+---+---+---+ cannam@89: cannam@89: cannam@89: cannam@89: cannam@89: Deutsch Informational [Page 5] cannam@89: cannam@89: RFC 1952 GZIP File Format Specification May 1996 cannam@89: cannam@89: cannam@89: 2.3.1. Member header and trailer cannam@89: cannam@89: ID1 (IDentification 1) cannam@89: ID2 (IDentification 2) cannam@89: These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139 cannam@89: (0x8b, \213), to identify the file as being in gzip format. cannam@89: cannam@89: CM (Compression Method) cannam@89: This identifies the compression method used in the file. CM cannam@89: = 0-7 are reserved. CM = 8 denotes the "deflate" cannam@89: compression method, which is the one customarily used by cannam@89: gzip and which is documented elsewhere. cannam@89: cannam@89: FLG (FLaGs) cannam@89: This flag byte is divided into individual bits as follows: cannam@89: cannam@89: bit 0 FTEXT cannam@89: bit 1 FHCRC cannam@89: bit 2 FEXTRA cannam@89: bit 3 FNAME cannam@89: bit 4 FCOMMENT cannam@89: bit 5 reserved cannam@89: bit 6 reserved cannam@89: bit 7 reserved cannam@89: cannam@89: If FTEXT is set, the file is probably ASCII text. This is cannam@89: an optional indication, which the compressor may set by cannam@89: checking a small amount of the input data to see whether any cannam@89: non-ASCII characters are present. In case of doubt, FTEXT cannam@89: is cleared, indicating binary data. For systems which have cannam@89: different file formats for ascii text and binary data, the cannam@89: decompressor can use FTEXT to choose the appropriate format. cannam@89: We deliberately do not specify the algorithm used to set cannam@89: this bit, since a compressor always has the option of cannam@89: leaving it cleared and a decompressor always has the option cannam@89: of ignoring it and letting some other program handle issues cannam@89: of data conversion. cannam@89: cannam@89: If FHCRC is set, a CRC16 for the gzip header is present, cannam@89: immediately before the compressed data. The CRC16 consists cannam@89: of the two least significant bytes of the CRC32 for all cannam@89: bytes of the gzip header up to and not including the CRC16. cannam@89: [The FHCRC bit was never set by versions of gzip up to cannam@89: 1.2.4, even though it was documented with a different cannam@89: meaning in gzip 1.2.4.] cannam@89: cannam@89: If FEXTRA is set, optional extra fields are present, as cannam@89: described in a following section. cannam@89: cannam@89: cannam@89: cannam@89: Deutsch Informational [Page 6] cannam@89: cannam@89: RFC 1952 GZIP File Format Specification May 1996 cannam@89: cannam@89: cannam@89: If FNAME is set, an original file name is present, cannam@89: terminated by a zero byte. The name must consist of ISO cannam@89: 8859-1 (LATIN-1) characters; on operating systems using cannam@89: EBCDIC or any other character set for file names, the name cannam@89: must be translated to the ISO LATIN-1 character set. This cannam@89: is the original name of the file being compressed, with any cannam@89: directory components removed, and, if the file being cannam@89: compressed is on a file system with case insensitive names, cannam@89: forced to lower case. There is no original file name if the cannam@89: data was compressed from a source other than a named file; cannam@89: for example, if the source was stdin on a Unix system, there cannam@89: is no file name. cannam@89: cannam@89: If FCOMMENT is set, a zero-terminated file comment is cannam@89: present. This comment is not interpreted; it is only cannam@89: intended for human consumption. The comment must consist of cannam@89: ISO 8859-1 (LATIN-1) characters. Line breaks should be cannam@89: denoted by a single line feed character (10 decimal). cannam@89: cannam@89: Reserved FLG bits must be zero. cannam@89: cannam@89: MTIME (Modification TIME) cannam@89: This gives the most recent modification time of the original cannam@89: file being compressed. The time is in Unix format, i.e., cannam@89: seconds since 00:00:00 GMT, Jan. 1, 1970. (Note that this cannam@89: may cause problems for MS-DOS and other systems that use cannam@89: local rather than Universal time.) If the compressed data cannam@89: did not come from a file, MTIME is set to the time at which cannam@89: compression started. MTIME = 0 means no time stamp is cannam@89: available. cannam@89: cannam@89: XFL (eXtra FLags) cannam@89: These flags are available for use by specific compression cannam@89: methods. The "deflate" method (CM = 8) sets these flags as cannam@89: follows: cannam@89: cannam@89: XFL = 2 - compressor used maximum compression, cannam@89: slowest algorithm cannam@89: XFL = 4 - compressor used fastest algorithm cannam@89: cannam@89: OS (Operating System) cannam@89: This identifies the type of file system on which compression cannam@89: took place. This may be useful in determining end-of-line cannam@89: convention for text files. The currently defined values are cannam@89: as follows: cannam@89: cannam@89: cannam@89: cannam@89: cannam@89: cannam@89: cannam@89: Deutsch Informational [Page 7] cannam@89: cannam@89: RFC 1952 GZIP File Format Specification May 1996 cannam@89: cannam@89: cannam@89: 0 - FAT filesystem (MS-DOS, OS/2, NT/Win32) cannam@89: 1 - Amiga cannam@89: 2 - VMS (or OpenVMS) cannam@89: 3 - Unix cannam@89: 4 - VM/CMS cannam@89: 5 - Atari TOS cannam@89: 6 - HPFS filesystem (OS/2, NT) cannam@89: 7 - Macintosh cannam@89: 8 - Z-System cannam@89: 9 - CP/M cannam@89: 10 - TOPS-20 cannam@89: 11 - NTFS filesystem (NT) cannam@89: 12 - QDOS cannam@89: 13 - Acorn RISCOS cannam@89: 255 - unknown cannam@89: cannam@89: XLEN (eXtra LENgth) cannam@89: If FLG.FEXTRA is set, this gives the length of the optional cannam@89: extra field. See below for details. cannam@89: cannam@89: CRC32 (CRC-32) cannam@89: This contains a Cyclic Redundancy Check value of the cannam@89: uncompressed data computed according to CRC-32 algorithm cannam@89: used in the ISO 3309 standard and in section 8.1.1.6.2 of cannam@89: ITU-T recommendation V.42. (See http://www.iso.ch for cannam@89: ordering ISO documents. See gopher://info.itu.ch for an cannam@89: online version of ITU-T V.42.) cannam@89: cannam@89: ISIZE (Input SIZE) cannam@89: This contains the size of the original (uncompressed) input cannam@89: data modulo 2^32. cannam@89: cannam@89: 2.3.1.1. Extra field cannam@89: cannam@89: If the FLG.FEXTRA bit is set, an "extra field" is present in cannam@89: the header, with total length XLEN bytes. It consists of a cannam@89: series of subfields, each of the form: cannam@89: cannam@89: +---+---+---+---+==================================+ cannam@89: |SI1|SI2| LEN |... LEN bytes of subfield data ...| cannam@89: +---+---+---+---+==================================+ cannam@89: cannam@89: SI1 and SI2 provide a subfield ID, typically two ASCII letters cannam@89: with some mnemonic value. Jean-Loup Gailly cannam@89: is maintaining a registry of subfield cannam@89: IDs; please send him any subfield ID you wish to use. Subfield cannam@89: IDs with SI2 = 0 are reserved for future use. The following cannam@89: IDs are currently defined: cannam@89: cannam@89: cannam@89: cannam@89: Deutsch Informational [Page 8] cannam@89: cannam@89: RFC 1952 GZIP File Format Specification May 1996 cannam@89: cannam@89: cannam@89: SI1 SI2 Data cannam@89: ---------- ---------- ---- cannam@89: 0x41 ('A') 0x70 ('P') Apollo file type information cannam@89: cannam@89: LEN gives the length of the subfield data, excluding the 4 cannam@89: initial bytes. cannam@89: cannam@89: 2.3.1.2. Compliance cannam@89: cannam@89: A compliant compressor must produce files with correct ID1, cannam@89: ID2, CM, CRC32, and ISIZE, but may set all the other fields in cannam@89: the fixed-length part of the header to default values (255 for cannam@89: OS, 0 for all others). The compressor must set all reserved cannam@89: bits to zero. cannam@89: cannam@89: A compliant decompressor must check ID1, ID2, and CM, and cannam@89: provide an error indication if any of these have incorrect cannam@89: values. It must examine FEXTRA/XLEN, FNAME, FCOMMENT and FHCRC cannam@89: at least so it can skip over the optional fields if they are cannam@89: present. It need not examine any other part of the header or cannam@89: trailer; in particular, a decompressor may ignore FTEXT and OS cannam@89: and always produce binary output, and still be compliant. A cannam@89: compliant decompressor must give an error indication if any cannam@89: reserved bit is non-zero, since such a bit could indicate the cannam@89: presence of a new field that would cause subsequent data to be cannam@89: interpreted incorrectly. cannam@89: cannam@89: 3. References cannam@89: cannam@89: [1] "Information Processing - 8-bit single-byte coded graphic cannam@89: character sets - Part 1: Latin alphabet No.1" (ISO 8859-1:1987). cannam@89: The ISO 8859-1 (Latin-1) character set is a superset of 7-bit cannam@89: ASCII. Files defining this character set are available as cannam@89: iso_8859-1.* in ftp://ftp.uu.net/graphics/png/documents/ cannam@89: cannam@89: [2] ISO 3309 cannam@89: cannam@89: [3] ITU-T recommendation V.42 cannam@89: cannam@89: [4] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification", cannam@89: available in ftp://ftp.uu.net/pub/archiving/zip/doc/ cannam@89: cannam@89: [5] Gailly, J.-L., GZIP documentation, available as gzip-*.tar in cannam@89: ftp://prep.ai.mit.edu/pub/gnu/ cannam@89: cannam@89: [6] Sarwate, D.V., "Computation of Cyclic Redundancy Checks via Table cannam@89: Look-Up", Communications of the ACM, 31(8), pp.1008-1013. cannam@89: cannam@89: cannam@89: cannam@89: cannam@89: Deutsch Informational [Page 9] cannam@89: cannam@89: RFC 1952 GZIP File Format Specification May 1996 cannam@89: cannam@89: cannam@89: [7] Schwaderer, W.D., "CRC Calculation", April 85 PC Tech Journal, cannam@89: pp.118-133. cannam@89: cannam@89: [8] ftp://ftp.adelaide.edu.au/pub/rocksoft/papers/crc_v3.txt, cannam@89: describing the CRC concept. cannam@89: cannam@89: 4. Security Considerations cannam@89: cannam@89: Any data compression method involves the reduction of redundancy in cannam@89: the data. Consequently, any corruption of the data is likely to have cannam@89: severe effects and be difficult to correct. Uncompressed text, on cannam@89: the other hand, will probably still be readable despite the presence cannam@89: of some corrupted bytes. cannam@89: cannam@89: It is recommended that systems using this data format provide some cannam@89: means of validating the integrity of the compressed data, such as by cannam@89: setting and checking the CRC-32 check value. cannam@89: cannam@89: 5. Acknowledgements cannam@89: cannam@89: Trademarks cited in this document are the property of their cannam@89: respective owners. cannam@89: cannam@89: Jean-Loup Gailly designed the gzip format and wrote, with Mark Adler, cannam@89: the related software described in this specification. Glenn cannam@89: Randers-Pehrson converted this document to RFC and HTML format. cannam@89: cannam@89: 6. Author's Address cannam@89: cannam@89: L. Peter Deutsch cannam@89: Aladdin Enterprises cannam@89: 203 Santa Margarita Ave. cannam@89: Menlo Park, CA 94025 cannam@89: cannam@89: Phone: (415) 322-0103 (AM only) cannam@89: FAX: (415) 322-1734 cannam@89: EMail: cannam@89: cannam@89: Questions about the technical content of this specification can be cannam@89: sent by email to: cannam@89: cannam@89: Jean-Loup Gailly and cannam@89: Mark Adler cannam@89: cannam@89: Editorial comments on this specification can be sent by email to: cannam@89: cannam@89: L. Peter Deutsch and cannam@89: Glenn Randers-Pehrson cannam@89: cannam@89: cannam@89: cannam@89: Deutsch Informational [Page 10] cannam@89: cannam@89: RFC 1952 GZIP File Format Specification May 1996 cannam@89: cannam@89: cannam@89: 7. Appendix: Jean-Loup Gailly's gzip utility cannam@89: cannam@89: The most widely used implementation of gzip compression, and the cannam@89: original documentation on which this specification is based, were cannam@89: created by Jean-Loup Gailly . Since this cannam@89: implementation is a de facto standard, we mention some more of its cannam@89: features here. Again, the material in this section is not part of cannam@89: the specification per se, and implementations need not follow it to cannam@89: be compliant. cannam@89: cannam@89: When compressing or decompressing a file, gzip preserves the cannam@89: protection, ownership, and modification time attributes on the local cannam@89: file system, since there is no provision for representing protection cannam@89: attributes in the gzip file format itself. Since the file format cannam@89: includes a modification time, the gzip decompressor provides a cannam@89: command line switch that assigns the modification time from the file, cannam@89: rather than the local modification time of the compressed input, to cannam@89: the decompressed output. cannam@89: cannam@89: 8. Appendix: Sample CRC Code cannam@89: cannam@89: The following sample code represents a practical implementation of cannam@89: the CRC (Cyclic Redundancy Check). (See also ISO 3309 and ITU-T V.42 cannam@89: for a formal specification.) cannam@89: cannam@89: The sample code is in the ANSI C programming language. Non C users cannam@89: may find it easier to read with these hints: cannam@89: cannam@89: & Bitwise AND operator. cannam@89: ^ Bitwise exclusive-OR operator. cannam@89: >> Bitwise right shift operator. When applied to an cannam@89: unsigned quantity, as here, right shift inserts zero cannam@89: bit(s) at the left. cannam@89: ! Logical NOT operator. cannam@89: ++ "n++" increments the variable n. cannam@89: 0xNNN 0x introduces a hexadecimal (base 16) constant. cannam@89: Suffix L indicates a long value (at least 32 bits). cannam@89: cannam@89: /* Table of CRCs of all 8-bit messages. */ cannam@89: unsigned long crc_table[256]; cannam@89: cannam@89: /* Flag: has the table been computed? Initially false. */ cannam@89: int crc_table_computed = 0; cannam@89: cannam@89: /* Make the table for a fast CRC. */ cannam@89: void make_crc_table(void) cannam@89: { cannam@89: unsigned long c; cannam@89: cannam@89: cannam@89: cannam@89: Deutsch Informational [Page 11] cannam@89: cannam@89: RFC 1952 GZIP File Format Specification May 1996 cannam@89: cannam@89: cannam@89: int n, k; cannam@89: for (n = 0; n < 256; n++) { cannam@89: c = (unsigned long) n; cannam@89: for (k = 0; k < 8; k++) { cannam@89: if (c & 1) { cannam@89: c = 0xedb88320L ^ (c >> 1); cannam@89: } else { cannam@89: c = c >> 1; cannam@89: } cannam@89: } cannam@89: crc_table[n] = c; cannam@89: } cannam@89: crc_table_computed = 1; cannam@89: } cannam@89: cannam@89: /* cannam@89: Update a running crc with the bytes buf[0..len-1] and return cannam@89: the updated crc. The crc should be initialized to zero. Pre- and cannam@89: post-conditioning (one's complement) is performed within this cannam@89: function so it shouldn't be done by the caller. Usage example: cannam@89: cannam@89: unsigned long crc = 0L; cannam@89: cannam@89: while (read_buffer(buffer, length) != EOF) { cannam@89: crc = update_crc(crc, buffer, length); cannam@89: } cannam@89: if (crc != original_crc) error(); cannam@89: */ cannam@89: unsigned long update_crc(unsigned long crc, cannam@89: unsigned char *buf, int len) cannam@89: { cannam@89: unsigned long c = crc ^ 0xffffffffL; cannam@89: int n; cannam@89: cannam@89: if (!crc_table_computed) cannam@89: make_crc_table(); cannam@89: for (n = 0; n < len; n++) { cannam@89: c = crc_table[(c ^ buf[n]) & 0xff] ^ (c >> 8); cannam@89: } cannam@89: return c ^ 0xffffffffL; cannam@89: } cannam@89: cannam@89: /* Return the CRC of the bytes buf[0..len-1]. */ cannam@89: unsigned long crc(unsigned char *buf, int len) cannam@89: { cannam@89: return update_crc(0L, buf, len); cannam@89: } cannam@89: cannam@89: cannam@89: cannam@89: cannam@89: Deutsch Informational [Page 12] cannam@89: