diff cpack/dml/scripts/compression/README @ 0:718306e29690 tip

commiting public release
author Daniel Wolff
date Tue, 09 Feb 2016 21:05:06 +0100
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/cpack/dml/scripts/compression/README	Tue Feb 09 21:05:06 2016 +0100
@@ -0,0 +1,72 @@
+# Delta compression
+
+Scripts in dml-cliopatrial/cpack/dml/scripts/compression provide a common interface
+to several delta compression programs.  The interface is 
+
+	stdin --->  [ <script name>  (encode|decode) <name of reference file> ] ---> stdout
+
+The following scripts work this way:
+
+	zbs - use bsdiff
+	zxd - uses xdelta3
+	zvcd - uses open-vcdiff
+	zvcz - uses vczip
+	zdiff - converts binary to text and uses diff to produce an ed script
+
+# bufs
+
+The bufs script allows an arbitrary command to be run such that if the command expects a
+filename as its nth argument, then 
+
+	$ bufs <n> <command> <arg1> ... <argn> ...
+	
+can be run with <argn> as a bash process redirection, even if <command> reads that
+source several times. bufs works by buffering the stream on the nth argument to a temporary
+file.
+
+
+# findcat
+
+findcat dumps the contents of every file under a given directory to stdout.
+
+# Examples
+
+For example, to estimate the conditional K.C. of all the humdrum files in ~/lib/kern/ireland
+given those in ~/lib/ker/lorraine, using xdelta3,  do
+
+	$ findcat ~/lib/kern/ireland | bufs 2 zxd encode <(findcat ~/lib/kern/lorraine) | length
+
+Scripts encode/decode include bufs, so an alternative is
+
+	$ findcat ~/lib/kern/ireland | encode zxd <(findcat ~/lib/kern/lorraine) | length
+
+A better estimate is
+
+	$ findcat ~/lib/kern/ireland | rid -G | encode zxd <(findcat ~/lib/kern/lorraine | rid -G) | length
+
+Sometimes the output can be compressed still further:
+
+	$ findcat ~/lib/kern/ireland | rid -G | encode zxd <(findcat ~/lib/kern/lorraine | rid -G) | lzma | length
+
+rid -G is a humdrum command that removes comments.
+
+
+# zzcd and zzd
+
+The scripts zzd and zzcd implement more complex schemes, where the input and the reference
+are concatenated and/or compressed before delta compressed. For example,
+
+	$ findcat ~/lib/kern/ireland | rid -G | bufs 3 zzcd lzma zxd  <(findcat ~/lib/kern/lorraine | rid -G) | length
+
+computes (using a more functional notation)
+
+	length( zxd( lzma(lorraine), lzma(lorraine+ireland)))
+
+that is, the amount of information needed to transform one LZMA compressed corpus
+into the LZMA compressed concatenation of two corpuses.
+
+# dlzma
+
+This is a program written in C using liblzma (part of xz utils package) to estimate the conditional
+complexity of an object. It works by using the SYNC_FLUSH feature of liblzma. The compressed data is
+discarded and only the number of bits used is output on stdout.