cannam@128
|
1 This is a patched version of zlib, modified to use
|
cannam@128
|
2 Pentium-Pro-optimized assembly code in the deflation algorithm. The
|
cannam@128
|
3 files changed/added by this patch are:
|
cannam@128
|
4
|
cannam@128
|
5 README.686
|
cannam@128
|
6 match.S
|
cannam@128
|
7
|
cannam@128
|
8 The speedup that this patch provides varies, depending on whether the
|
cannam@128
|
9 compiler used to build the original version of zlib falls afoul of the
|
cannam@128
|
10 PPro's speed traps. My own tests show a speedup of around 10-20% at
|
cannam@128
|
11 the default compression level, and 20-30% using -9, against a version
|
cannam@128
|
12 compiled using gcc 2.7.2.3. Your mileage may vary.
|
cannam@128
|
13
|
cannam@128
|
14 Note that this code has been tailored for the PPro/PII in particular,
|
cannam@128
|
15 and will not perform particuarly well on a Pentium.
|
cannam@128
|
16
|
cannam@128
|
17 If you are using an assembler other than GNU as, you will have to
|
cannam@128
|
18 translate match.S to use your assembler's syntax. (Have fun.)
|
cannam@128
|
19
|
cannam@128
|
20 Brian Raiter
|
cannam@128
|
21 breadbox@muppetlabs.com
|
cannam@128
|
22 April, 1998
|
cannam@128
|
23
|
cannam@128
|
24
|
cannam@128
|
25 Added for zlib 1.1.3:
|
cannam@128
|
26
|
cannam@128
|
27 The patches come from
|
cannam@128
|
28 http://www.muppetlabs.com/~breadbox/software/assembly.html
|
cannam@128
|
29
|
cannam@128
|
30 To compile zlib with this asm file, copy match.S to the zlib directory
|
cannam@128
|
31 then do:
|
cannam@128
|
32
|
cannam@128
|
33 CFLAGS="-O3 -DASMV" ./configure
|
cannam@128
|
34 make OBJA=match.o
|
cannam@128
|
35
|
cannam@128
|
36
|
cannam@128
|
37 Update:
|
cannam@128
|
38
|
cannam@128
|
39 I've been ignoring these assembly routines for years, believing that
|
cannam@128
|
40 gcc's generated code had caught up with it sometime around gcc 2.95
|
cannam@128
|
41 and the major rearchitecting of the Pentium 4. However, I recently
|
cannam@128
|
42 learned that, despite what I believed, this code still has some life
|
cannam@128
|
43 in it. On the Pentium 4 and AMD64 chips, it continues to run about 8%
|
cannam@128
|
44 faster than the code produced by gcc 4.1.
|
cannam@128
|
45
|
cannam@128
|
46 In acknowledgement of its continuing usefulness, I've altered the
|
cannam@128
|
47 license to match that of the rest of zlib. Share and Enjoy!
|
cannam@128
|
48
|
cannam@128
|
49 Brian Raiter
|
cannam@128
|
50 breadbox@muppetlabs.com
|
cannam@128
|
51 April, 2007
|