| Daniel@0 | 1 # Delta compression | 
| Daniel@0 | 2 | 
| Daniel@0 | 3 Scripts in dml-cliopatrial/cpack/dml/scripts/compression provide a common interface | 
| Daniel@0 | 4 to several delta compression programs.  The interface is | 
| Daniel@0 | 5 | 
| Daniel@0 | 6 	stdin --->  [ <script name>  (encode|decode) <name of reference file> ] ---> stdout | 
| Daniel@0 | 7 | 
| Daniel@0 | 8 The following scripts work this way: | 
| Daniel@0 | 9 | 
| Daniel@0 | 10 	zbs - use bsdiff | 
| Daniel@0 | 11 	zxd - uses xdelta3 | 
| Daniel@0 | 12 	zvcd - uses open-vcdiff | 
| Daniel@0 | 13 	zvcz - uses vczip | 
| Daniel@0 | 14 	zdiff - converts binary to text and uses diff to produce an ed script | 
| Daniel@0 | 15 | 
| Daniel@0 | 16 # bufs | 
| Daniel@0 | 17 | 
| Daniel@0 | 18 The bufs script allows an arbitrary command to be run such that if the command expects a | 
| Daniel@0 | 19 filename as its nth argument, then | 
| Daniel@0 | 20 | 
| Daniel@0 | 21 	$ bufs <n> <command> <arg1> ... <argn> ... | 
| Daniel@0 | 22 | 
| Daniel@0 | 23 can be run with <argn> as a bash process redirection, even if <command> reads that | 
| Daniel@0 | 24 source several times. bufs works by buffering the stream on the nth argument to a temporary | 
| Daniel@0 | 25 file. | 
| Daniel@0 | 26 | 
| Daniel@0 | 27 | 
| Daniel@0 | 28 # findcat | 
| Daniel@0 | 29 | 
| Daniel@0 | 30 findcat dumps the contents of every file under a given directory to stdout. | 
| Daniel@0 | 31 | 
| Daniel@0 | 32 # Examples | 
| Daniel@0 | 33 | 
| Daniel@0 | 34 For example, to estimate the conditional K.C. of all the humdrum files in ~/lib/kern/ireland | 
| Daniel@0 | 35 given those in ~/lib/ker/lorraine, using xdelta3,  do | 
| Daniel@0 | 36 | 
| Daniel@0 | 37 	$ findcat ~/lib/kern/ireland | bufs 2 zxd encode <(findcat ~/lib/kern/lorraine) | length | 
| Daniel@0 | 38 | 
| Daniel@0 | 39 Scripts encode/decode include bufs, so an alternative is | 
| Daniel@0 | 40 | 
| Daniel@0 | 41 	$ findcat ~/lib/kern/ireland | encode zxd <(findcat ~/lib/kern/lorraine) | length | 
| Daniel@0 | 42 | 
| Daniel@0 | 43 A better estimate is | 
| Daniel@0 | 44 | 
| Daniel@0 | 45 	$ findcat ~/lib/kern/ireland | rid -G | encode zxd <(findcat ~/lib/kern/lorraine | rid -G) | length | 
| Daniel@0 | 46 | 
| Daniel@0 | 47 Sometimes the output can be compressed still further: | 
| Daniel@0 | 48 | 
| Daniel@0 | 49 	$ findcat ~/lib/kern/ireland | rid -G | encode zxd <(findcat ~/lib/kern/lorraine | rid -G) | lzma | length | 
| Daniel@0 | 50 | 
| Daniel@0 | 51 rid -G is a humdrum command that removes comments. | 
| Daniel@0 | 52 | 
| Daniel@0 | 53 | 
| Daniel@0 | 54 # zzcd and zzd | 
| Daniel@0 | 55 | 
| Daniel@0 | 56 The scripts zzd and zzcd implement more complex schemes, where the input and the reference | 
| Daniel@0 | 57 are concatenated and/or compressed before delta compressed. For example, | 
| Daniel@0 | 58 | 
| Daniel@0 | 59 	$ findcat ~/lib/kern/ireland | rid -G | bufs 3 zzcd lzma zxd  <(findcat ~/lib/kern/lorraine | rid -G) | length | 
| Daniel@0 | 60 | 
| Daniel@0 | 61 computes (using a more functional notation) | 
| Daniel@0 | 62 | 
| Daniel@0 | 63 	length( zxd( lzma(lorraine), lzma(lorraine+ireland))) | 
| Daniel@0 | 64 | 
| Daniel@0 | 65 that is, the amount of information needed to transform one LZMA compressed corpus | 
| Daniel@0 | 66 into the LZMA compressed concatenation of two corpuses. | 
| Daniel@0 | 67 | 
| Daniel@0 | 68 # dlzma | 
| Daniel@0 | 69 | 
| Daniel@0 | 70 This is a program written in C using liblzma (part of xz utils package) to estimate the conditional | 
| Daniel@0 | 71 complexity of an object. It works by using the SYNC_FLUSH feature of liblzma. The compressed data is | 
| Daniel@0 | 72 discarded and only the number of bits used is output on stdout. |