Mercurial > hg > dml-open-cliopatria
diff cpack/dml/scripts/compression/README @ 0:718306e29690 tip
commiting public release
author | Daniel Wolff |
---|---|
date | Tue, 09 Feb 2016 21:05:06 +0100 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/cpack/dml/scripts/compression/README Tue Feb 09 21:05:06 2016 +0100 @@ -0,0 +1,72 @@ +# Delta compression + +Scripts in dml-cliopatrial/cpack/dml/scripts/compression provide a common interface +to several delta compression programs. The interface is + + stdin ---> [ <script name> (encode|decode) <name of reference file> ] ---> stdout + +The following scripts work this way: + + zbs - use bsdiff + zxd - uses xdelta3 + zvcd - uses open-vcdiff + zvcz - uses vczip + zdiff - converts binary to text and uses diff to produce an ed script + +# bufs + +The bufs script allows an arbitrary command to be run such that if the command expects a +filename as its nth argument, then + + $ bufs <n> <command> <arg1> ... <argn> ... + +can be run with <argn> as a bash process redirection, even if <command> reads that +source several times. bufs works by buffering the stream on the nth argument to a temporary +file. + + +# findcat + +findcat dumps the contents of every file under a given directory to stdout. + +# Examples + +For example, to estimate the conditional K.C. of all the humdrum files in ~/lib/kern/ireland +given those in ~/lib/ker/lorraine, using xdelta3, do + + $ findcat ~/lib/kern/ireland | bufs 2 zxd encode <(findcat ~/lib/kern/lorraine) | length + +Scripts encode/decode include bufs, so an alternative is + + $ findcat ~/lib/kern/ireland | encode zxd <(findcat ~/lib/kern/lorraine) | length + +A better estimate is + + $ findcat ~/lib/kern/ireland | rid -G | encode zxd <(findcat ~/lib/kern/lorraine | rid -G) | length + +Sometimes the output can be compressed still further: + + $ findcat ~/lib/kern/ireland | rid -G | encode zxd <(findcat ~/lib/kern/lorraine | rid -G) | lzma | length + +rid -G is a humdrum command that removes comments. + + +# zzcd and zzd + +The scripts zzd and zzcd implement more complex schemes, where the input and the reference +are concatenated and/or compressed before delta compressed. For example, + + $ findcat ~/lib/kern/ireland | rid -G | bufs 3 zzcd lzma zxd <(findcat ~/lib/kern/lorraine | rid -G) | length + +computes (using a more functional notation) + + length( zxd( lzma(lorraine), lzma(lorraine+ireland))) + +that is, the amount of information needed to transform one LZMA compressed corpus +into the LZMA compressed concatenation of two corpuses. + +# dlzma + +This is a program written in C using liblzma (part of xz utils package) to estimate the conditional +complexity of an object. It works by using the SYNC_FLUSH feature of liblzma. The compressed data is +discarded and only the number of bits used is output on stdout.