Mercurial > hg > dml-open-cliopatria
comparison cpack/dml/scripts/compression/README @ 0:718306e29690 tip
commiting public release
author | Daniel Wolff |
---|---|
date | Tue, 09 Feb 2016 21:05:06 +0100 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:718306e29690 |
---|---|
1 # Delta compression | |
2 | |
3 Scripts in dml-cliopatrial/cpack/dml/scripts/compression provide a common interface | |
4 to several delta compression programs. The interface is | |
5 | |
6 stdin ---> [ <script name> (encode|decode) <name of reference file> ] ---> stdout | |
7 | |
8 The following scripts work this way: | |
9 | |
10 zbs - use bsdiff | |
11 zxd - uses xdelta3 | |
12 zvcd - uses open-vcdiff | |
13 zvcz - uses vczip | |
14 zdiff - converts binary to text and uses diff to produce an ed script | |
15 | |
16 # bufs | |
17 | |
18 The bufs script allows an arbitrary command to be run such that if the command expects a | |
19 filename as its nth argument, then | |
20 | |
21 $ bufs <n> <command> <arg1> ... <argn> ... | |
22 | |
23 can be run with <argn> as a bash process redirection, even if <command> reads that | |
24 source several times. bufs works by buffering the stream on the nth argument to a temporary | |
25 file. | |
26 | |
27 | |
28 # findcat | |
29 | |
30 findcat dumps the contents of every file under a given directory to stdout. | |
31 | |
32 # Examples | |
33 | |
34 For example, to estimate the conditional K.C. of all the humdrum files in ~/lib/kern/ireland | |
35 given those in ~/lib/ker/lorraine, using xdelta3, do | |
36 | |
37 $ findcat ~/lib/kern/ireland | bufs 2 zxd encode <(findcat ~/lib/kern/lorraine) | length | |
38 | |
39 Scripts encode/decode include bufs, so an alternative is | |
40 | |
41 $ findcat ~/lib/kern/ireland | encode zxd <(findcat ~/lib/kern/lorraine) | length | |
42 | |
43 A better estimate is | |
44 | |
45 $ findcat ~/lib/kern/ireland | rid -G | encode zxd <(findcat ~/lib/kern/lorraine | rid -G) | length | |
46 | |
47 Sometimes the output can be compressed still further: | |
48 | |
49 $ findcat ~/lib/kern/ireland | rid -G | encode zxd <(findcat ~/lib/kern/lorraine | rid -G) | lzma | length | |
50 | |
51 rid -G is a humdrum command that removes comments. | |
52 | |
53 | |
54 # zzcd and zzd | |
55 | |
56 The scripts zzd and zzcd implement more complex schemes, where the input and the reference | |
57 are concatenated and/or compressed before delta compressed. For example, | |
58 | |
59 $ findcat ~/lib/kern/ireland | rid -G | bufs 3 zzcd lzma zxd <(findcat ~/lib/kern/lorraine | rid -G) | length | |
60 | |
61 computes (using a more functional notation) | |
62 | |
63 length( zxd( lzma(lorraine), lzma(lorraine+ireland))) | |
64 | |
65 that is, the amount of information needed to transform one LZMA compressed corpus | |
66 into the LZMA compressed concatenation of two corpuses. | |
67 | |
68 # dlzma | |
69 | |
70 This is a program written in C using liblzma (part of xz utils package) to estimate the conditional | |
71 complexity of an object. It works by using the SYNC_FLUSH feature of liblzma. The compressed data is | |
72 discarded and only the number of bits used is output on stdout. |