silvet: notes/em.txt annotate

annotate notes/em.txt @ 372:af71cbdab621 tip

Update bqvec code

author	Chris Cannam
date	Tue, 19 Nov 2019 10:13:32 +0000
parents	f1f8c84339d0
children

rev	line source
Chris@19	1
Chris@19	2 I agree with you - having a look at a model that does not support
Chris@19	3 convolution would help. You'll find attached 'hnmf.m', which is
Chris@19	4 essentially the same model without convolution. So the simplified model
Chris@19	5 is: P(w,t) = P(t) \sum_{s,p} P(w\|p,s)P(p\|t)P(s\|p,t)
Chris@19	6
Chris@19	7 Also, a more recent (and much more efficient) version of the CMJ system
Chris@19	8 converts the model from a convolutive to a linear one, but still keeping
Chris@19	9 the shift-invariance support. That is achieved by having a pre-extracted
Chris@19	10 4-D dictionary that also supports templates that are pre-shifted across
Chris@19	11 log-frequency (so that the system would not need to compute the
Chris@19	12 convolutions during the EM step). I have uploaded the source code on
Chris@19	13 SoundSoftware [i], and you can find the related paper in [ii]. This
Chris@19	14 system has the exact same performance with the CMJ one, but is much
Chris@19	15 easier to understand/implement, and is over 50 times faster.
Chris@19	16
Chris@19	17 [i] https://code.soundsoftware.ac.uk/projects/amt_mssiplca_fast
Chris@19	18 [ii] http://www.ecmlpkdd2013.org/wp-content/uploads/2013/09/MLMU_benetos.pdf
Chris@19	19
Chris@19	20 > In eqn 12,
Chris@19	21 > Pt(p) =
Chris@19	22 > sum[w,f,s] ( P(p,f,s\|w,t) Vw,t ) /
Chris@19	23 > sum[p,w,f,s] ( P(p,f,s\|w,t) Vw,t )
Chris@19	24 >
Chris@19	25 > P(p,f,s\|w,t) is the result of the E-step (and a time-frequency
Chris@19	26 > distribution), and Vw,t is the input spectrogram (also a
Chris@19	27 > time-frequency distribution), right?
Chris@19	28
Chris@19	29 Right! Basically, P(p,f,s\|w,t) is a 5-dimensional matrix, essentially
Chris@19	30 the model without the sums (the sums convert P(p,f,s\|w,t) into a 2-D
Chris@19	31 matrix P(w,t)).
Chris@19	32
Chris@19	33 > So I read this as something like: update the pitch probability
Chris@19	34 > distribution for time t so that its value for a pitch p is the ratio
Chris@19	35 > of the sum of the expression P(p,f,s\|w,t) Vw,t for that pitch
Chris@19	36 > variable to the sum of the same expression across all pitch
Chris@19	37 > variables.
Chris@19	38
Chris@19	39 The equation you put essentially takes the 5-dimensional quantity
Chris@19	40 P(p,f,s\|w,t) Vw,t and marginalises it to P(p,t), i.e. it sums over all
Chris@19	41 other dimensions. All these 'unknown' parameters, e.g. P(s\|p,t), are
Chris@19	42 generated from this 5-dimensional posterior distribution.
Chris@19	43
Chris@19	44 > But what does it mean to refer to P(p,f,s\|w,t) for a single pitch
Chris@19	45 > variable, given that P(p,f,s\|w,t) is just a time-frequency
Chris@19	46 > distribution? There doesn't seem to be any dependence on p in it. I
Chris@19	47 > think this is where I'm missing the (hopefully obvious) fundamental
Chris@19	48 > thing.
Chris@19	49
Chris@19	50 P(p,f,s\|w,t) is not a time-frequency distribution; it is a 5-dimensional
Chris@19	51 posterior distribution of the 3 unknown model parameters given time and
Chris@19	52 frequency.
Chris@19	53
Chris@19	54 The basic concept of EM is that you have a latent variable in your
Chris@19	55 model, e.g. p; in the E-step, you compute the posterior given the
Chris@19	56 known/input data (e.g. P(p\|w,t)). For the M-step, you compute the
Chris@19	57 complete likelihood given the original input, P(p\|w,t)Vwt; and you
Chris@19	58 marginalise over the variables that you don't care about, e.g. if you
Chris@19	59 want to find P(p\|t), you compute \sum_w P(p\|w,t)V_wt; finally, you
Chris@19	60 normalise that result according to your model, so if your model has a
Chris@19	61 P(p\|t) component, you normalise so that P(p) for a given timeframe sums
Chris@19	62 to one (this is the denominator in the equation you showed).
Chris@19	63

Mercurial > hg > silvet

annotate notes/em.txt @ 372:af71cbdab621 tip