Chris@4: Chris@4: Chris@4: Chris@4: Chris@4: Chris@4: Chris@4: Network Working Group P. Deutsch Chris@4: Request for Comments: 1952 Aladdin Enterprises Chris@4: Category: Informational May 1996 Chris@4: Chris@4: Chris@4: GZIP file format specification version 4.3 Chris@4: Chris@4: Status of This Memo Chris@4: Chris@4: This memo provides information for the Internet community. This memo Chris@4: does not specify an Internet standard of any kind. Distribution of Chris@4: this memo is unlimited. Chris@4: Chris@4: IESG Note: Chris@4: Chris@4: The IESG takes no position on the validity of any Intellectual Chris@4: Property Rights statements contained in this document. Chris@4: Chris@4: Notices Chris@4: Chris@4: Copyright (c) 1996 L. Peter Deutsch Chris@4: Chris@4: Permission is granted to copy and distribute this document for any Chris@4: purpose and without charge, including translations into other Chris@4: languages and incorporation into compilations, provided that the Chris@4: copyright notice and this notice are preserved, and that any Chris@4: substantive changes or deletions from the original are clearly Chris@4: marked. Chris@4: Chris@4: A pointer to the latest version of this and related documentation in Chris@4: HTML format can be found at the URL Chris@4: . Chris@4: Chris@4: Abstract Chris@4: Chris@4: This specification defines a lossless compressed data format that is Chris@4: compatible with the widely used GZIP utility. The format includes a Chris@4: cyclic redundancy check value for detecting data corruption. The Chris@4: format presently uses the DEFLATE method of compression but can be Chris@4: easily extended to use other compression methods. The format can be Chris@4: implemented readily in a manner not covered by patents. Chris@4: Chris@4: Chris@4: Chris@4: Chris@4: Chris@4: Chris@4: Chris@4: Chris@4: Chris@4: Chris@4: Deutsch Informational [Page 1] Chris@4: Chris@4: RFC 1952 GZIP File Format Specification May 1996 Chris@4: Chris@4: Chris@4: Table of Contents Chris@4: Chris@4: 1. Introduction ................................................... 2 Chris@4: 1.1. Purpose ................................................... 2 Chris@4: 1.2. Intended audience ......................................... 3 Chris@4: 1.3. Scope ..................................................... 3 Chris@4: 1.4. Compliance ................................................ 3 Chris@4: 1.5. Definitions of terms and conventions used ................. 3 Chris@4: 1.6. Changes from previous versions ............................ 3 Chris@4: 2. Detailed specification ......................................... 4 Chris@4: 2.1. Overall conventions ....................................... 4 Chris@4: 2.2. File format ............................................... 5 Chris@4: 2.3. Member format ............................................. 5 Chris@4: 2.3.1. Member header and trailer ........................... 6 Chris@4: 2.3.1.1. Extra field ................................... 8 Chris@4: 2.3.1.2. Compliance .................................... 9 Chris@4: 3. References .................................................. 9 Chris@4: 4. Security Considerations .................................... 10 Chris@4: 5. Acknowledgements ........................................... 10 Chris@4: 6. Author's Address ........................................... 10 Chris@4: 7. Appendix: Jean-Loup Gailly's gzip utility .................. 11 Chris@4: 8. Appendix: Sample CRC Code .................................. 11 Chris@4: Chris@4: 1. Introduction Chris@4: Chris@4: 1.1. Purpose Chris@4: Chris@4: The purpose of this specification is to define a lossless Chris@4: compressed data format that: Chris@4: Chris@4: * Is independent of CPU type, operating system, file system, Chris@4: and character set, and hence can be used for interchange; Chris@4: * Can compress or decompress a data stream (as opposed to a Chris@4: randomly accessible file) to produce another data stream, Chris@4: using only an a priori bounded amount of intermediate Chris@4: storage, and hence can be used in data communications or Chris@4: similar structures such as Unix filters; Chris@4: * Compresses data with efficiency comparable to the best Chris@4: currently available general-purpose compression methods, Chris@4: and in particular considerably better than the "compress" Chris@4: program; Chris@4: * Can be implemented readily in a manner not covered by Chris@4: patents, and hence can be practiced freely; Chris@4: * Is compatible with the file format produced by the current Chris@4: widely used gzip utility, in that conforming decompressors Chris@4: will be able to read data produced by the existing gzip Chris@4: compressor. Chris@4: Chris@4: Chris@4: Chris@4: Chris@4: Deutsch Informational [Page 2] Chris@4: Chris@4: RFC 1952 GZIP File Format Specification May 1996 Chris@4: Chris@4: Chris@4: The data format defined by this specification does not attempt to: Chris@4: Chris@4: * Provide random access to compressed data; Chris@4: * Compress specialized data (e.g., raster graphics) as well as Chris@4: the best currently available specialized algorithms. Chris@4: Chris@4: 1.2. Intended audience Chris@4: Chris@4: This specification is intended for use by implementors of software Chris@4: to compress data into gzip format and/or decompress data from gzip Chris@4: format. Chris@4: Chris@4: The text of the specification assumes a basic background in Chris@4: programming at the level of bits and other primitive data Chris@4: representations. Chris@4: Chris@4: 1.3. Scope Chris@4: Chris@4: The specification specifies a compression method and a file format Chris@4: (the latter assuming only that a file can store a sequence of Chris@4: arbitrary bytes). It does not specify any particular interface to Chris@4: a file system or anything about character sets or encodings Chris@4: (except for file names and comments, which are optional). Chris@4: Chris@4: 1.4. Compliance Chris@4: Chris@4: Unless otherwise indicated below, a compliant decompressor must be Chris@4: able to accept and decompress any file that conforms to all the Chris@4: specifications presented here; a compliant compressor must produce Chris@4: files that conform to all the specifications presented here. The Chris@4: material in the appendices is not part of the specification per se Chris@4: and is not relevant to compliance. Chris@4: Chris@4: 1.5. Definitions of terms and conventions used Chris@4: Chris@4: byte: 8 bits stored or transmitted as a unit (same as an octet). Chris@4: (For this specification, a byte is exactly 8 bits, even on Chris@4: machines which store a character on a number of bits different Chris@4: from 8.) See below for the numbering of bits within a byte. Chris@4: Chris@4: 1.6. Changes from previous versions Chris@4: Chris@4: There have been no technical changes to the gzip format since Chris@4: version 4.1 of this specification. In version 4.2, some Chris@4: terminology was changed, and the sample CRC code was rewritten for Chris@4: clarity and to eliminate the requirement for the caller to do pre- Chris@4: and post-conditioning. Version 4.3 is a conversion of the Chris@4: specification to RFC style. Chris@4: Chris@4: Chris@4: Chris@4: Deutsch Informational [Page 3] Chris@4: Chris@4: RFC 1952 GZIP File Format Specification May 1996 Chris@4: Chris@4: Chris@4: 2. Detailed specification Chris@4: Chris@4: 2.1. Overall conventions Chris@4: Chris@4: In the diagrams below, a box like this: Chris@4: Chris@4: +---+ Chris@4: | | <-- the vertical bars might be missing Chris@4: +---+ Chris@4: Chris@4: represents one byte; a box like this: Chris@4: Chris@4: +==============+ Chris@4: | | Chris@4: +==============+ Chris@4: Chris@4: represents a variable number of bytes. Chris@4: Chris@4: Bytes stored within a computer do not have a "bit order", since Chris@4: they are always treated as a unit. However, a byte considered as Chris@4: an integer between 0 and 255 does have a most- and least- Chris@4: significant bit, and since we write numbers with the most- Chris@4: significant digit on the left, we also write bytes with the most- Chris@4: significant bit on the left. In the diagrams below, we number the Chris@4: bits of a byte so that bit 0 is the least-significant bit, i.e., Chris@4: the bits are numbered: Chris@4: Chris@4: +--------+ Chris@4: |76543210| Chris@4: +--------+ Chris@4: Chris@4: This document does not address the issue of the order in which Chris@4: bits of a byte are transmitted on a bit-sequential medium, since Chris@4: the data format described here is byte- rather than bit-oriented. Chris@4: Chris@4: Within a computer, a number may occupy multiple bytes. All Chris@4: multi-byte numbers in the format described here are stored with Chris@4: the least-significant byte first (at the lower memory address). Chris@4: For example, the decimal number 520 is stored as: Chris@4: Chris@4: 0 1 Chris@4: +--------+--------+ Chris@4: |00001000|00000010| Chris@4: +--------+--------+ Chris@4: ^ ^ Chris@4: | | Chris@4: | + more significant byte = 2 x 256 Chris@4: + less significant byte = 8 Chris@4: Chris@4: Chris@4: Chris@4: Deutsch Informational [Page 4] Chris@4: Chris@4: RFC 1952 GZIP File Format Specification May 1996 Chris@4: Chris@4: Chris@4: 2.2. File format Chris@4: Chris@4: A gzip file consists of a series of "members" (compressed data Chris@4: sets). The format of each member is specified in the following Chris@4: section. The members simply appear one after another in the file, Chris@4: with no additional information before, between, or after them. Chris@4: Chris@4: 2.3. Member format Chris@4: Chris@4: Each member has the following structure: Chris@4: Chris@4: +---+---+---+---+---+---+---+---+---+---+ Chris@4: |ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->) Chris@4: +---+---+---+---+---+---+---+---+---+---+ Chris@4: Chris@4: (if FLG.FEXTRA set) Chris@4: Chris@4: +---+---+=================================+ Chris@4: | XLEN |...XLEN bytes of "extra field"...| (more-->) Chris@4: +---+---+=================================+ Chris@4: Chris@4: (if FLG.FNAME set) Chris@4: Chris@4: +=========================================+ Chris@4: |...original file name, zero-terminated...| (more-->) Chris@4: +=========================================+ Chris@4: Chris@4: (if FLG.FCOMMENT set) Chris@4: Chris@4: +===================================+ Chris@4: |...file comment, zero-terminated...| (more-->) Chris@4: +===================================+ Chris@4: Chris@4: (if FLG.FHCRC set) Chris@4: Chris@4: +---+---+ Chris@4: | CRC16 | Chris@4: +---+---+ Chris@4: Chris@4: +=======================+ Chris@4: |...compressed blocks...| (more-->) Chris@4: +=======================+ Chris@4: Chris@4: 0 1 2 3 4 5 6 7 Chris@4: +---+---+---+---+---+---+---+---+ Chris@4: | CRC32 | ISIZE | Chris@4: +---+---+---+---+---+---+---+---+ Chris@4: Chris@4: Chris@4: Chris@4: Chris@4: Deutsch Informational [Page 5] Chris@4: Chris@4: RFC 1952 GZIP File Format Specification May 1996 Chris@4: Chris@4: Chris@4: 2.3.1. Member header and trailer Chris@4: Chris@4: ID1 (IDentification 1) Chris@4: ID2 (IDentification 2) Chris@4: These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139 Chris@4: (0x8b, \213), to identify the file as being in gzip format. Chris@4: Chris@4: CM (Compression Method) Chris@4: This identifies the compression method used in the file. CM Chris@4: = 0-7 are reserved. CM = 8 denotes the "deflate" Chris@4: compression method, which is the one customarily used by Chris@4: gzip and which is documented elsewhere. Chris@4: Chris@4: FLG (FLaGs) Chris@4: This flag byte is divided into individual bits as follows: Chris@4: Chris@4: bit 0 FTEXT Chris@4: bit 1 FHCRC Chris@4: bit 2 FEXTRA Chris@4: bit 3 FNAME Chris@4: bit 4 FCOMMENT Chris@4: bit 5 reserved Chris@4: bit 6 reserved Chris@4: bit 7 reserved Chris@4: Chris@4: If FTEXT is set, the file is probably ASCII text. This is Chris@4: an optional indication, which the compressor may set by Chris@4: checking a small amount of the input data to see whether any Chris@4: non-ASCII characters are present. In case of doubt, FTEXT Chris@4: is cleared, indicating binary data. For systems which have Chris@4: different file formats for ascii text and binary data, the Chris@4: decompressor can use FTEXT to choose the appropriate format. Chris@4: We deliberately do not specify the algorithm used to set Chris@4: this bit, since a compressor always has the option of Chris@4: leaving it cleared and a decompressor always has the option Chris@4: of ignoring it and letting some other program handle issues Chris@4: of data conversion. Chris@4: Chris@4: If FHCRC is set, a CRC16 for the gzip header is present, Chris@4: immediately before the compressed data. The CRC16 consists Chris@4: of the two least significant bytes of the CRC32 for all Chris@4: bytes of the gzip header up to and not including the CRC16. Chris@4: [The FHCRC bit was never set by versions of gzip up to Chris@4: 1.2.4, even though it was documented with a different Chris@4: meaning in gzip 1.2.4.] Chris@4: Chris@4: If FEXTRA is set, optional extra fields are present, as Chris@4: described in a following section. Chris@4: Chris@4: Chris@4: Chris@4: Deutsch Informational [Page 6] Chris@4: Chris@4: RFC 1952 GZIP File Format Specification May 1996 Chris@4: Chris@4: Chris@4: If FNAME is set, an original file name is present, Chris@4: terminated by a zero byte. The name must consist of ISO Chris@4: 8859-1 (LATIN-1) characters; on operating systems using Chris@4: EBCDIC or any other character set for file names, the name Chris@4: must be translated to the ISO LATIN-1 character set. This Chris@4: is the original name of the file being compressed, with any Chris@4: directory components removed, and, if the file being Chris@4: compressed is on a file system with case insensitive names, Chris@4: forced to lower case. There is no original file name if the Chris@4: data was compressed from a source other than a named file; Chris@4: for example, if the source was stdin on a Unix system, there Chris@4: is no file name. Chris@4: Chris@4: If FCOMMENT is set, a zero-terminated file comment is Chris@4: present. This comment is not interpreted; it is only Chris@4: intended for human consumption. The comment must consist of Chris@4: ISO 8859-1 (LATIN-1) characters. Line breaks should be Chris@4: denoted by a single line feed character (10 decimal). Chris@4: Chris@4: Reserved FLG bits must be zero. Chris@4: Chris@4: MTIME (Modification TIME) Chris@4: This gives the most recent modification time of the original Chris@4: file being compressed. The time is in Unix format, i.e., Chris@4: seconds since 00:00:00 GMT, Jan. 1, 1970. (Note that this Chris@4: may cause problems for MS-DOS and other systems that use Chris@4: local rather than Universal time.) If the compressed data Chris@4: did not come from a file, MTIME is set to the time at which Chris@4: compression started. MTIME = 0 means no time stamp is Chris@4: available. Chris@4: Chris@4: XFL (eXtra FLags) Chris@4: These flags are available for use by specific compression Chris@4: methods. The "deflate" method (CM = 8) sets these flags as Chris@4: follows: Chris@4: Chris@4: XFL = 2 - compressor used maximum compression, Chris@4: slowest algorithm Chris@4: XFL = 4 - compressor used fastest algorithm Chris@4: Chris@4: OS (Operating System) Chris@4: This identifies the type of file system on which compression Chris@4: took place. This may be useful in determining end-of-line Chris@4: convention for text files. The currently defined values are Chris@4: as follows: Chris@4: Chris@4: Chris@4: Chris@4: Chris@4: Chris@4: Chris@4: Deutsch Informational [Page 7] Chris@4: Chris@4: RFC 1952 GZIP File Format Specification May 1996 Chris@4: Chris@4: Chris@4: 0 - FAT filesystem (MS-DOS, OS/2, NT/Win32) Chris@4: 1 - Amiga Chris@4: 2 - VMS (or OpenVMS) Chris@4: 3 - Unix Chris@4: 4 - VM/CMS Chris@4: 5 - Atari TOS Chris@4: 6 - HPFS filesystem (OS/2, NT) Chris@4: 7 - Macintosh Chris@4: 8 - Z-System Chris@4: 9 - CP/M Chris@4: 10 - TOPS-20 Chris@4: 11 - NTFS filesystem (NT) Chris@4: 12 - QDOS Chris@4: 13 - Acorn RISCOS Chris@4: 255 - unknown Chris@4: Chris@4: XLEN (eXtra LENgth) Chris@4: If FLG.FEXTRA is set, this gives the length of the optional Chris@4: extra field. See below for details. Chris@4: Chris@4: CRC32 (CRC-32) Chris@4: This contains a Cyclic Redundancy Check value of the Chris@4: uncompressed data computed according to CRC-32 algorithm Chris@4: used in the ISO 3309 standard and in section 8.1.1.6.2 of Chris@4: ITU-T recommendation V.42. (See http://www.iso.ch for Chris@4: ordering ISO documents. See gopher://info.itu.ch for an Chris@4: online version of ITU-T V.42.) Chris@4: Chris@4: ISIZE (Input SIZE) Chris@4: This contains the size of the original (uncompressed) input Chris@4: data modulo 2^32. Chris@4: Chris@4: 2.3.1.1. Extra field Chris@4: Chris@4: If the FLG.FEXTRA bit is set, an "extra field" is present in Chris@4: the header, with total length XLEN bytes. It consists of a Chris@4: series of subfields, each of the form: Chris@4: Chris@4: +---+---+---+---+==================================+ Chris@4: |SI1|SI2| LEN |... LEN bytes of subfield data ...| Chris@4: +---+---+---+---+==================================+ Chris@4: Chris@4: SI1 and SI2 provide a subfield ID, typically two ASCII letters Chris@4: with some mnemonic value. Jean-Loup Gailly Chris@4: is maintaining a registry of subfield Chris@4: IDs; please send him any subfield ID you wish to use. Subfield Chris@4: IDs with SI2 = 0 are reserved for future use. The following Chris@4: IDs are currently defined: Chris@4: Chris@4: Chris@4: Chris@4: Deutsch Informational [Page 8] Chris@4: Chris@4: RFC 1952 GZIP File Format Specification May 1996 Chris@4: Chris@4: Chris@4: SI1 SI2 Data Chris@4: ---------- ---------- ---- Chris@4: 0x41 ('A') 0x70 ('P') Apollo file type information Chris@4: Chris@4: LEN gives the length of the subfield data, excluding the 4 Chris@4: initial bytes. Chris@4: Chris@4: 2.3.1.2. Compliance Chris@4: Chris@4: A compliant compressor must produce files with correct ID1, Chris@4: ID2, CM, CRC32, and ISIZE, but may set all the other fields in Chris@4: the fixed-length part of the header to default values (255 for Chris@4: OS, 0 for all others). The compressor must set all reserved Chris@4: bits to zero. Chris@4: Chris@4: A compliant decompressor must check ID1, ID2, and CM, and Chris@4: provide an error indication if any of these have incorrect Chris@4: values. It must examine FEXTRA/XLEN, FNAME, FCOMMENT and FHCRC Chris@4: at least so it can skip over the optional fields if they are Chris@4: present. It need not examine any other part of the header or Chris@4: trailer; in particular, a decompressor may ignore FTEXT and OS Chris@4: and always produce binary output, and still be compliant. A Chris@4: compliant decompressor must give an error indication if any Chris@4: reserved bit is non-zero, since such a bit could indicate the Chris@4: presence of a new field that would cause subsequent data to be Chris@4: interpreted incorrectly. Chris@4: Chris@4: 3. References Chris@4: Chris@4: [1] "Information Processing - 8-bit single-byte coded graphic Chris@4: character sets - Part 1: Latin alphabet No.1" (ISO 8859-1:1987). Chris@4: The ISO 8859-1 (Latin-1) character set is a superset of 7-bit Chris@4: ASCII. Files defining this character set are available as Chris@4: iso_8859-1.* in ftp://ftp.uu.net/graphics/png/documents/ Chris@4: Chris@4: [2] ISO 3309 Chris@4: Chris@4: [3] ITU-T recommendation V.42 Chris@4: Chris@4: [4] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification", Chris@4: available in ftp://ftp.uu.net/pub/archiving/zip/doc/ Chris@4: Chris@4: [5] Gailly, J.-L., GZIP documentation, available as gzip-*.tar in Chris@4: ftp://prep.ai.mit.edu/pub/gnu/ Chris@4: Chris@4: [6] Sarwate, D.V., "Computation of Cyclic Redundancy Checks via Table Chris@4: Look-Up", Communications of the ACM, 31(8), pp.1008-1013. Chris@4: Chris@4: Chris@4: Chris@4: Chris@4: Deutsch Informational [Page 9] Chris@4: Chris@4: RFC 1952 GZIP File Format Specification May 1996 Chris@4: Chris@4: Chris@4: [7] Schwaderer, W.D., "CRC Calculation", April 85 PC Tech Journal, Chris@4: pp.118-133. Chris@4: Chris@4: [8] ftp://ftp.adelaide.edu.au/pub/rocksoft/papers/crc_v3.txt, Chris@4: describing the CRC concept. Chris@4: Chris@4: 4. Security Considerations Chris@4: Chris@4: Any data compression method involves the reduction of redundancy in Chris@4: the data. Consequently, any corruption of the data is likely to have Chris@4: severe effects and be difficult to correct. Uncompressed text, on Chris@4: the other hand, will probably still be readable despite the presence Chris@4: of some corrupted bytes. Chris@4: Chris@4: It is recommended that systems using this data format provide some Chris@4: means of validating the integrity of the compressed data, such as by Chris@4: setting and checking the CRC-32 check value. Chris@4: Chris@4: 5. Acknowledgements Chris@4: Chris@4: Trademarks cited in this document are the property of their Chris@4: respective owners. Chris@4: Chris@4: Jean-Loup Gailly designed the gzip format and wrote, with Mark Adler, Chris@4: the related software described in this specification. Glenn Chris@4: Randers-Pehrson converted this document to RFC and HTML format. Chris@4: Chris@4: 6. Author's Address Chris@4: Chris@4: L. Peter Deutsch Chris@4: Aladdin Enterprises Chris@4: 203 Santa Margarita Ave. Chris@4: Menlo Park, CA 94025 Chris@4: Chris@4: Phone: (415) 322-0103 (AM only) Chris@4: FAX: (415) 322-1734 Chris@4: EMail: Chris@4: Chris@4: Questions about the technical content of this specification can be Chris@4: sent by email to: Chris@4: Chris@4: Jean-Loup Gailly and Chris@4: Mark Adler Chris@4: Chris@4: Editorial comments on this specification can be sent by email to: Chris@4: Chris@4: L. Peter Deutsch and Chris@4: Glenn Randers-Pehrson Chris@4: Chris@4: Chris@4: Chris@4: Deutsch Informational [Page 10] Chris@4: Chris@4: RFC 1952 GZIP File Format Specification May 1996 Chris@4: Chris@4: Chris@4: 7. Appendix: Jean-Loup Gailly's gzip utility Chris@4: Chris@4: The most widely used implementation of gzip compression, and the Chris@4: original documentation on which this specification is based, were Chris@4: created by Jean-Loup Gailly . Since this Chris@4: implementation is a de facto standard, we mention some more of its Chris@4: features here. Again, the material in this section is not part of Chris@4: the specification per se, and implementations need not follow it to Chris@4: be compliant. Chris@4: Chris@4: When compressing or decompressing a file, gzip preserves the Chris@4: protection, ownership, and modification time attributes on the local Chris@4: file system, since there is no provision for representing protection Chris@4: attributes in the gzip file format itself. Since the file format Chris@4: includes a modification time, the gzip decompressor provides a Chris@4: command line switch that assigns the modification time from the file, Chris@4: rather than the local modification time of the compressed input, to Chris@4: the decompressed output. Chris@4: Chris@4: 8. Appendix: Sample CRC Code Chris@4: Chris@4: The following sample code represents a practical implementation of Chris@4: the CRC (Cyclic Redundancy Check). (See also ISO 3309 and ITU-T V.42 Chris@4: for a formal specification.) Chris@4: Chris@4: The sample code is in the ANSI C programming language. Non C users Chris@4: may find it easier to read with these hints: Chris@4: Chris@4: & Bitwise AND operator. Chris@4: ^ Bitwise exclusive-OR operator. Chris@4: >> Bitwise right shift operator. When applied to an Chris@4: unsigned quantity, as here, right shift inserts zero Chris@4: bit(s) at the left. Chris@4: ! Logical NOT operator. Chris@4: ++ "n++" increments the variable n. Chris@4: 0xNNN 0x introduces a hexadecimal (base 16) constant. Chris@4: Suffix L indicates a long value (at least 32 bits). Chris@4: Chris@4: /* Table of CRCs of all 8-bit messages. */ Chris@4: unsigned long crc_table[256]; Chris@4: Chris@4: /* Flag: has the table been computed? Initially false. */ Chris@4: int crc_table_computed = 0; Chris@4: Chris@4: /* Make the table for a fast CRC. */ Chris@4: void make_crc_table(void) Chris@4: { Chris@4: unsigned long c; Chris@4: Chris@4: Chris@4: Chris@4: Deutsch Informational [Page 11] Chris@4: Chris@4: RFC 1952 GZIP File Format Specification May 1996 Chris@4: Chris@4: Chris@4: int n, k; Chris@4: for (n = 0; n < 256; n++) { Chris@4: c = (unsigned long) n; Chris@4: for (k = 0; k < 8; k++) { Chris@4: if (c & 1) { Chris@4: c = 0xedb88320L ^ (c >> 1); Chris@4: } else { Chris@4: c = c >> 1; Chris@4: } Chris@4: } Chris@4: crc_table[n] = c; Chris@4: } Chris@4: crc_table_computed = 1; Chris@4: } Chris@4: Chris@4: /* Chris@4: Update a running crc with the bytes buf[0..len-1] and return Chris@4: the updated crc. The crc should be initialized to zero. Pre- and Chris@4: post-conditioning (one's complement) is performed within this Chris@4: function so it shouldn't be done by the caller. Usage example: Chris@4: Chris@4: unsigned long crc = 0L; Chris@4: Chris@4: while (read_buffer(buffer, length) != EOF) { Chris@4: crc = update_crc(crc, buffer, length); Chris@4: } Chris@4: if (crc != original_crc) error(); Chris@4: */ Chris@4: unsigned long update_crc(unsigned long crc, Chris@4: unsigned char *buf, int len) Chris@4: { Chris@4: unsigned long c = crc ^ 0xffffffffL; Chris@4: int n; Chris@4: Chris@4: if (!crc_table_computed) Chris@4: make_crc_table(); Chris@4: for (n = 0; n < len; n++) { Chris@4: c = crc_table[(c ^ buf[n]) & 0xff] ^ (c >> 8); Chris@4: } Chris@4: return c ^ 0xffffffffL; Chris@4: } Chris@4: Chris@4: /* Return the CRC of the bytes buf[0..len-1]. */ Chris@4: unsigned long crc(unsigned char *buf, int len) Chris@4: { Chris@4: return update_crc(0L, buf, len); Chris@4: } Chris@4: Chris@4: Chris@4: Chris@4: Chris@4: Deutsch Informational [Page 12] Chris@4: