comparison cpack/dml/scripts/compression/README @ 0:718306e29690 tip

commiting public release
author Daniel Wolff
date Tue, 09 Feb 2016 21:05:06 +0100
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:718306e29690
1 # Delta compression
2
3 Scripts in dml-cliopatrial/cpack/dml/scripts/compression provide a common interface
4 to several delta compression programs. The interface is
5
6 stdin ---> [ <script name> (encode|decode) <name of reference file> ] ---> stdout
7
8 The following scripts work this way:
9
10 zbs - use bsdiff
11 zxd - uses xdelta3
12 zvcd - uses open-vcdiff
13 zvcz - uses vczip
14 zdiff - converts binary to text and uses diff to produce an ed script
15
16 # bufs
17
18 The bufs script allows an arbitrary command to be run such that if the command expects a
19 filename as its nth argument, then
20
21 $ bufs <n> <command> <arg1> ... <argn> ...
22
23 can be run with <argn> as a bash process redirection, even if <command> reads that
24 source several times. bufs works by buffering the stream on the nth argument to a temporary
25 file.
26
27
28 # findcat
29
30 findcat dumps the contents of every file under a given directory to stdout.
31
32 # Examples
33
34 For example, to estimate the conditional K.C. of all the humdrum files in ~/lib/kern/ireland
35 given those in ~/lib/ker/lorraine, using xdelta3, do
36
37 $ findcat ~/lib/kern/ireland | bufs 2 zxd encode <(findcat ~/lib/kern/lorraine) | length
38
39 Scripts encode/decode include bufs, so an alternative is
40
41 $ findcat ~/lib/kern/ireland | encode zxd <(findcat ~/lib/kern/lorraine) | length
42
43 A better estimate is
44
45 $ findcat ~/lib/kern/ireland | rid -G | encode zxd <(findcat ~/lib/kern/lorraine | rid -G) | length
46
47 Sometimes the output can be compressed still further:
48
49 $ findcat ~/lib/kern/ireland | rid -G | encode zxd <(findcat ~/lib/kern/lorraine | rid -G) | lzma | length
50
51 rid -G is a humdrum command that removes comments.
52
53
54 # zzcd and zzd
55
56 The scripts zzd and zzcd implement more complex schemes, where the input and the reference
57 are concatenated and/or compressed before delta compressed. For example,
58
59 $ findcat ~/lib/kern/ireland | rid -G | bufs 3 zzcd lzma zxd <(findcat ~/lib/kern/lorraine | rid -G) | length
60
61 computes (using a more functional notation)
62
63 length( zxd( lzma(lorraine), lzma(lorraine+ireland)))
64
65 that is, the amount of information needed to transform one LZMA compressed corpus
66 into the LZMA compressed concatenation of two corpuses.
67
68 # dlzma
69
70 This is a program written in C using liblzma (part of xz utils package) to estimate the conditional
71 complexity of an object. It works by using the SYNC_FLUSH feature of liblzma. The compressed data is
72 discarded and only the number of bits used is output on stdout.