dicklyon@534
|
1 CARFAC Design Doc
|
dicklyon@534
|
2 by "Richard F. Lyon" <dicklyon@google.com>
|
dicklyon@567
|
3 updated 24 May 2012 (v.237)
|
dicklyon@534
|
4
|
dicklyon@534
|
5 The CAR-FAC (cascade of asymmetric resonators with fast-acting
|
dicklyon@534
|
6 compression) is a cochlear model implemented as an efficient sound
|
dicklyon@534
|
7 processor, for mono, stereo, or multi-channel sound inputs. This file
|
dicklyon@534
|
8 describes aspects of the software design.
|
dicklyon@534
|
9
|
dicklyon@563
|
10 The implementation will include equivalent Matlab, C++, Python, and
|
dicklyon@534
|
11 perhaps other versions. They should all be based on the same set of
|
dicklyon@534
|
12 classes (or structs) and functions, with roughly equivalent
|
dicklyon@534
|
13 functionality and names where possible. The examples here are Matlab,
|
dicklyon@534
|
14 in which the structs are typeless, but in other languages each will be
|
dicklyon@534
|
15 class or type.
|
dicklyon@534
|
16
|
dicklyon@534
|
17 The top-level class is CARFAC. A CARFAC object knows how to design
|
dicklyon@534
|
18 its details from a modest set of parameters, and knows how to process
|
dicklyon@534
|
19 sound signals to produce "neural activity patterns" (NAPs). It
|
dicklyon@534
|
20 includes various sub-objects representing the parameters, designs, and
|
dicklyon@534
|
21 states of different parts of the CAR-FAC model.
|
dicklyon@534
|
22
|
dicklyon@563
|
23 The CARFAC class includes a vector of "ear" objects -- one for mono,
|
dicklyon@563
|
24 two for stereo, or more.
|
dicklyon@563
|
25
|
dicklyon@563
|
26 The three main subsystems of the EAR are the cascade of asymmetric
|
dicklyon@534
|
27 resonators (CAR), the inner hair cell (IHC), and the automatic gain
|
dicklyon@565
|
28 control (AGC). These are not intended to work independently, but
|
dicklyon@565
|
29 each part has three groups of data associated with it. The CARFAC
|
dicklyon@565
|
30 stores instances of the "params" that drive the design, and the
|
dicklyon@565
|
31 "coeffs" and "state" are stored per ear:
|
dicklyon@534
|
32
|
dicklyon@534
|
33 CAR_params, CAR_coeffs, CAR_state
|
dicklyon@534
|
34 IHC_params, IHC_coeffs, IHC_state
|
dicklyon@534
|
35 AGC_params, AGC_coeffs, AGC_state
|
dicklyon@534
|
36
|
dicklyon@534
|
37 These names can be used both for the classes, and slightly modified
|
dicklyon@534
|
38 for the member variables, arguments, temps, etc.
|
dicklyon@534
|
39
|
dicklyon@534
|
40 The params are inputs that specify the system; the coeffs are things
|
dicklyon@534
|
41 like filter coefficients, things computed once and used at run time;
|
dicklyon@534
|
42 the state is whatever internal state is needed between running the
|
dicklyon@534
|
43 model on samples or segments of sound waveforms, such as the state
|
dicklyon@534
|
44 variables of the digital filters.
|
dicklyon@534
|
45
|
dicklyon@565
|
46 At construction ("design") time, params are provided, and ears with coeffs
|
dicklyon@565
|
47 are created. The created CARFAC stores the params that it was designed
|
dicklyon@565
|
48 from; there is no state yet:
|
dicklyon@534
|
49
|
dicklyon@534
|
50 CF = CARFAC_Design(CAR_params, IHC_params, AGC_params)
|
dicklyon@534
|
51
|
dicklyon@534
|
52
|
dicklyon@534
|
53 On "channels" and "ears":
|
dicklyon@534
|
54
|
dicklyon@534
|
55 Stages of the cascade are called "channels", and there are n_ch of
|
dicklyon@534
|
56 them (n_ch is determined from other parameters, but we could also make
|
dicklyon@534
|
57 a function to pick params to get a desired n_ch). Since we already
|
dicklyon@534
|
58 used "channels" for that, sound input channels (one for monaural, two
|
dicklyon@534
|
59 for binaural, or more) are called "ears", and there are n_ears of them.
|
dicklyon@534
|
60 The parameter n_ears is set when state is initialized, but
|
dicklyon@534
|
61 not earlier at design time; it is not in params since it doesn't
|
dicklyon@534
|
62 affect coefficient design).
|
dicklyon@534
|
63
|
dicklyon@565
|
64 Multi-ear designs usually have the same coeffs across several ears, but
|
dicklyon@565
|
65 always need separate state.
|
dicklyon@565
|
66 The coeffs are kept separate per ear in case someone wants to
|
dicklyon@565
|
67 modify one or both to simulate asymmetric hearing loss or such.
|
dicklyon@565
|
68
|
dicklyon@565
|
69 The only place the ears interact is in the AGC. The function that
|
dicklyon@565
|
70 closes the AGC loop can "mix" their states, making an inter-ear
|
dicklyon@565
|
71 coupling. Other than that, the functions that update the CAR and IHC
|
dicklyon@565
|
72 and AGC states are simple single-ear functions (methods of the ear
|
dicklyon@565
|
73 class?).
|
dicklyon@534
|
74
|
dicklyon@534
|
75
|
dicklyon@534
|
76 Data size and performance:
|
dicklyon@534
|
77
|
dicklyon@534
|
78 The coeffs and states are several kilobytes each, since they store a
|
dicklyon@565
|
79 handful (10 or so) of floating-point values (at 4 or 8 bytes each) for
|
dicklyon@534
|
80 each channel (typically 60 to 100 channels); that 240 to 800 bytes per
|
dicklyon@534
|
81 coefficient and per state variable. That's the entire data memory
|
dicklyon@534
|
82 footprint; most of it is accessed at every sample time (hopefully it
|
dicklyon@534
|
83 will all fit and have good hit rate in a typical 32 KB L1 d-cache).
|
dicklyon@534
|
84 In Matlab we use doubles (8-byte), but in C++ we intend to use floats
|
dicklyon@534
|
85 (4-byte) and SSE (via Eigen) for higher performance. Alternate
|
dicklyon@534
|
86 implementations are OK.
|
dicklyon@534
|
87
|
dicklyon@534
|
88
|
dicklyon@534
|
89 Run-time strategy:
|
dicklyon@534
|
90
|
dicklyon@534
|
91 To support real-time applications, sound can be processed in short
|
dicklyon@534
|
92 segments, producing segments of NAPs:
|
dicklyon@534
|
93
|
dicklyon@534
|
94 [NAPs, CF] = CARFAC_Run_Segment(CF, input_waves)
|
dicklyon@534
|
95
|
dicklyon@534
|
96 Here the plurals ("NAPs" and "input_waves") suggest multiple ears.
|
dicklyon@534
|
97 These can be vectors, with length 1 for mono, or Matlab
|
dicklyon@534
|
98 multi-D arrays with a singleton last dimension.
|
dicklyon@534
|
99
|
dicklyon@534
|
100 The CF's states are updated such that there will be no glitch when a next
|
dicklyon@534
|
101 segment is processed. Segments are of arbitrary positive-integer
|
dicklyon@534
|
102 length, not necessarily equal. It is not inefficient to use a segment
|
dicklyon@534
|
103 length of 1 sample if results are needed with very low latency.
|
dicklyon@534
|
104
|
dicklyon@534
|
105 Internally, CARFAC_Run updates each part of the state, one sample at a
|
dicklyon@565
|
106 time. First the CAR, IHC, and AGC are updated ("stepped") for all ears:
|
dicklyon@534
|
107
|
dicklyon@534
|
108 [car_out, CAR_state] = CARFAC_CAR_Step(sample, CAR_coeffs, CAR_state)
|
dicklyon@534
|
109 [ihc_out, IHC_state] = CARFAC_IHC_Step(car_out, IHC_coeffs, IHC_state)
|
dicklyon@565
|
110 [AGC_state, updated] = CARFAC_AGC_Step(ihc_out, AGC_coeffs, AGC_state)
|
dicklyon@534
|
111
|
dicklyon@534
|
112 The AGC filter mostly runs at a lower sample rate (by a factor of 8 by
|
dicklyon@534
|
113 default). Usually it just accumulates its input and returns quickly. The
|
dicklyon@534
|
114 boolean "updated" indicates whether the AGC actually did some work and
|
dicklyon@534
|
115 has a new output that needs to be used to "close the loop" and modify
|
dicklyon@534
|
116 the CAR state. When it's true, there's one more step, involving both AGC
|
dicklyon@534
|
117 and CAR states, so it's a function on the CARFAC:
|
dicklyon@534
|
118
|
dicklyon@534
|
119 CF = CARFAC_Close_AGC_Loop(CF)
|
dicklyon@534
|
120
|
dicklyon@534
|
121 In Matlab, these functions return a modified copy of the state or
|
dicklyon@534
|
122 CARFAC; that's why we have the multiple returned values. In languages
|
dicklyon@534
|
123 that do more than just call-by-value, a reference will be passed so
|
dicklyon@534
|
124 the state can be modified in place instead.
|
dicklyon@534
|
125
|
dicklyon@534
|
126
|
dicklyon@534
|
127 C++ Eigen strategy:
|
dicklyon@534
|
128
|
dicklyon@534
|
129 The mapping from Matlab's "parallel" operations on vectors of values
|
dicklyon@534
|
130 into C++ code will be done by using Eigen
|
dicklyon@534
|
131 (http://eigen.tuxfamily.org/) arrays, which support similar
|
dicklyon@534
|
132 source-level parallel arithmetic, so we don't have to write loops over
|
dicklyon@534
|
133 channels. Eigen also allows fairly efficient compilation to SSE or
|
dicklyon@534
|
134 Arm/Neon instructions, which can do four float operations per cycle.
|
dicklyon@534
|
135 So we should be able to easily get to efficient code on various
|
dicklyon@534
|
136 platforms.
|
dicklyon@534
|
137
|
dicklyon@534
|
138 If there are similar strategies available in other languages, we should
|
dicklyon@534
|
139 use them. Python may have a route to using Eigen
|
dicklyon@534
|
140 (http://eigen.tuxfamily.org/index.php?title=PythonInterop).
|