Mercurial > hg > dml-open-cliopatria
view cpack/dml/scripts/compression/README @ 0:718306e29690 tip
commiting public release
author | Daniel Wolff |
---|---|
date | Tue, 09 Feb 2016 21:05:06 +0100 |
parents | |
children |
line wrap: on
line source
# Delta compression Scripts in dml-cliopatrial/cpack/dml/scripts/compression provide a common interface to several delta compression programs. The interface is stdin ---> [ <script name> (encode|decode) <name of reference file> ] ---> stdout The following scripts work this way: zbs - use bsdiff zxd - uses xdelta3 zvcd - uses open-vcdiff zvcz - uses vczip zdiff - converts binary to text and uses diff to produce an ed script # bufs The bufs script allows an arbitrary command to be run such that if the command expects a filename as its nth argument, then $ bufs <n> <command> <arg1> ... <argn> ... can be run with <argn> as a bash process redirection, even if <command> reads that source several times. bufs works by buffering the stream on the nth argument to a temporary file. # findcat findcat dumps the contents of every file under a given directory to stdout. # Examples For example, to estimate the conditional K.C. of all the humdrum files in ~/lib/kern/ireland given those in ~/lib/ker/lorraine, using xdelta3, do $ findcat ~/lib/kern/ireland | bufs 2 zxd encode <(findcat ~/lib/kern/lorraine) | length Scripts encode/decode include bufs, so an alternative is $ findcat ~/lib/kern/ireland | encode zxd <(findcat ~/lib/kern/lorraine) | length A better estimate is $ findcat ~/lib/kern/ireland | rid -G | encode zxd <(findcat ~/lib/kern/lorraine | rid -G) | length Sometimes the output can be compressed still further: $ findcat ~/lib/kern/ireland | rid -G | encode zxd <(findcat ~/lib/kern/lorraine | rid -G) | lzma | length rid -G is a humdrum command that removes comments. # zzcd and zzd The scripts zzd and zzcd implement more complex schemes, where the input and the reference are concatenated and/or compressed before delta compressed. For example, $ findcat ~/lib/kern/ireland | rid -G | bufs 3 zzcd lzma zxd <(findcat ~/lib/kern/lorraine | rid -G) | length computes (using a more functional notation) length( zxd( lzma(lorraine), lzma(lorraine+ireland))) that is, the amount of information needed to transform one LZMA compressed corpus into the LZMA compressed concatenation of two corpuses. # dlzma This is a program written in C using liblzma (part of xz utils package) to estimate the conditional complexity of an object. It works by using the SYNC_FLUSH feature of liblzma. The compressed data is discarded and only the number of bits used is output on stdout.