Dawn@4: function [f0_time,f0_value,SHR,f0_candidates]=shrp(Y,Fs,F0MinMax,frame_length,timestep,SHR_Threshold,ceiling,med_smooth,CHECK_VOICING) Dawn@4: % SHRP - a pitch determination algorithm based on Subharmonic-to-Harmonic Ratio (SHR) Dawn@4: % [f0_time,f0_value,SHR,f0_candidates]=shrp(Y,Fs[,F0MinMax,frame_length,TimeStep,SHR_Threshold,Ceiling,med_smooth,CHECK_VOICING]) Dawn@4: % Dawn@4: % Input parameters (There are 9): Dawn@4: % Dawn@4: % Y: Input data Dawn@4: % Fs: Sampling frequency (e.g., 16000 Hz) Dawn@4: % F0MinMax: 2-d array specifies the F0 range. [minf0 maxf0], default: [50 550] Dawn@4: % Quick solutions: Dawn@4: % For male speech: [50 250] Dawn@4: % For female speech: [120 400] Dawn@4: % frame_length: length of each frame in millisecond (default: 40 ms) Dawn@4: % TimeStep: Interval for updating short-term analysis in millisecond (default: 10 ms) Dawn@4: % SHR_Threshold: Subharmonic-to-harmonic ratio threshold in the range of [0,1] (default: 0.4). Dawn@4: % If the estimated SHR is greater than the threshold, the subharmonic is regarded as F0 candidate, Dawn@4: % Otherwise, the harmonic is favored. Dawn@4: % Ceiling: Upper bound of the frequencies that are used for estimating pitch. (default: 1250 Hz) Dawn@4: % med_smooth: the order of the median smoothing (default: 0 - no smoothing); Dawn@4: % CHECK_VOICING: check voicing. Current voicing determination algorithm is kind of crude. Dawn@4: % 0: no voicing checking (default) Dawn@4: % 1: voicing checking Dawn@4: % Output parameters: Dawn@4: % Dawn@4: % f0_time: an array stores the times for the F0 points Dawn@4: % f0_value: an array stores F0 values Dawn@4: % SHR: an array stores subharmonic-to-harmonic ratio for each frame Dawn@4: % f0_candidates: a matrix stores the f0 candidates for each frames, currently two f0 values generated for each frame. Dawn@4: % Each row (a frame) contains two values in increasing order, i.e., [low_f0 higher_f0]. Dawn@4: % For SHR=0, the first f0 is 0. The purpose of this is that when you want to test different SHR Dawn@4: % thresholds, you don't need to re-run the whole algorithm. You can choose to select the lower or higher Dawn@4: % value based on the shr value of this frame. Dawn@4: % Dawn@4: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Dawn@4: % Permission to use, copy, modify, and distribute this software without fee is hereby granted Dawn@4: % FOR RESEARCH PURPOSES only, provided that this copyright notice appears in all copies Dawn@4: % and in all supporting documentation. Dawn@4: % Dawn@4: % This program is distributed in the hope that it will be useful, Dawn@4: % but WITHOUT ANY WARRANTY; without even the implied warranty of Dawn@4: % MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Dawn@4: % Dawn@4: % For details of the algorithm, please see Dawn@4: % Sun, X.,"Pitch determination and voice quality analysis using subharmonic-to-harmonic ratio" To appear in the Proc. of ICASSP2002, Orlando, Florida, May 13 -17, 2002. Dawn@4: % For update information, please check http://mel.speech.nwu.edu/sunxj/pda.htm. Dawn@4: % Dawn@4: % Copyright (c) 2001 Xuejing Sun Dawn@4: % Department of Communication Sciences and Disorders Dawn@4: % Northwestern University, USA Dawn@4: % sunxj@northwestern.edu Dawn@4: % Dawn@4: % Update history: Dawn@4: % Added "f0_candidates" as a return value, Dec. 21, 2001 Dawn@4: % Changed default median smoothing order from 5 to 0, Jan. 9, 2002 Dawn@4: % Modified the GetLogSpectrum function, bug fixed due to Herbert Griebel. Jan. 15, 2002 Dawn@4: % Several minor changes. Jan. 15,2002. Dawn@4: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Dawn@4: Dawn@4: %t0 = clock; Dawn@4: %------------------ Get input arguments values and set default values ------------------------- Dawn@4: if nargin<9 Dawn@4: CHECK_VOICING=0; Dawn@4: end Dawn@4: if nargin<8 Dawn@4: med_smooth=0; Dawn@4: end Dawn@4: if nargin<7 Dawn@4: ceiling=1250; Dawn@4: end Dawn@4: if nargin<6 Dawn@4: SHR_Threshold=0.4; % subharmonic to harmonic ratio threshold Dawn@4: end Dawn@4: if nargin<5 Dawn@4: timestep=10; Dawn@4: %timestep=6.4; Dawn@4: end Dawn@4: if nargin<4 Dawn@4: frame_length=40; % default 40 ms Dawn@4: end Dawn@4: if nargin<3 Dawn@4: minf0=50; Dawn@4: maxf0=500; Dawn@4: else Dawn@4: minf0=F0MinMax(1); Dawn@4: maxf0=F0MinMax(2); Dawn@4: end Dawn@4: if nargin<2 Dawn@4: error('Sampling rate must be supplied!') Dawn@4: end Dawn@4: segmentduration=frame_length; Dawn@4: Dawn@4: %------------------- pre-processing input signal ------------------------- Dawn@4: Y=Y-mean(Y); % remove DC component Dawn@4: Y=Y/max(abs(Y)); %normalization Dawn@4: total_len=length(Y); Dawn@4: %------------------ specify some algorithm-specific thresholds ------------------------- Dawn@4: interpolation_depth=0.5; % for FFT length Dawn@4: %--------------- derived thresholds specific to the algorithm ------------------------------- Dawn@4: maxlogf=log2(maxf0/2); Dawn@4: minlogf=log2(minf0/2); % the search region to compute SHR is as low as 0.5 minf0. Dawn@4: N=floor(ceiling/minf0); % maximum number harmonics Dawn@4: m=mod(N,2); Dawn@4: N=N-m; Dawn@4: N=N*4; %In fact, in most cases we don't need to multiply N by 4 and get equally good results yet much faster. Dawn@4: % derive how many frames we have based on segment length and timestep. Dawn@4: segmentlen=round(segmentduration*(Fs/1000)); Dawn@4: inc=round(timestep*(Fs/1000)); Dawn@4: nf = fix((total_len-segmentlen+inc)/inc); Dawn@4: n=(1:nf); Dawn@4: f0_time=((n-1)*timestep+segmentduration/2)'; % anchor time for each frame, the middle point Dawn@4: %f0_time=((n-1)*timestep)'; % anchor time for each frame, starting from zero Dawn@4: %------------------ determine FFT length --------------------- Dawn@4: fftlen=1; Dawn@4: while (fftlen < segmentlen * (1 +interpolation_depth)) Dawn@4: fftlen =fftlen* 2; Dawn@4: end Dawn@4: %----------------- derive linear and log frequency scale ---------------- Dawn@4: frequency=Fs*(1:fftlen/2)/fftlen; % we ignore frequency 0 here since we need to do log transformation later and won't use it anyway. Dawn@4: limit=find(frequency>=ceiling); Dawn@4: limit=limit(1); % only the first is useful Dawn@4: frequency=frequency(1:limit); Dawn@4: logf=log2(frequency); Dawn@4: %% clear some variables to save memory Dawn@4: clear frequency; Dawn@4: min_bin=logf(end)-logf(end-1); % the minimum distance between two points after interpolation Dawn@4: shift=log2(N); % shift distance Dawn@4: shift_units=round(shift/min_bin); %the number of unit on the log x-axis Dawn@4: i=(2:N); Dawn@4: % ------------- the followings are universal for all the frames ---------------%% Dawn@4: startpos=shift_units+1-round(log2(i)/min_bin); % find out all the start position of each shift Dawn@4: index=find(startpos<1); % find out those positions that are less than 1 Dawn@4: startpos(index)=1; % set them to 1 since the array index starts from 1 in matlab Dawn@4: interp_logf=logf(1):min_bin:logf(end); Dawn@4: interp_len=length(interp_logf);% new length of the amplitude spectrum after interpolation Dawn@4: totallen=shift_units+interp_len; Dawn@4: endpos=startpos+interp_len-1; %% note that : totallen=shift_units+interp_len; Dawn@4: index=find(endpos>totallen); Dawn@4: endpos(index)=totallen; % make sure all the end positions not greater than the totoal length of the shift spectrum Dawn@4: Dawn@4: newfre=2.^(interp_logf); % the linear Hz scale derived from the interpolated log scale Dawn@4: upperbound=find(interp_logf>=maxlogf); % find out the index of upper bound of search region on the log frequency scale. Dawn@4: upperbound=upperbound(1);% only the first element is useful Dawn@4: lowerbound=find(interp_logf>=minlogf); % find out the index of lower bound of search region on the log frequency scale. Dawn@4: lowerbound=lowerbound(1); Dawn@4: Dawn@4: %----------------- segmentation of speech ------------------------------ Dawn@4: curpos=round(f0_time/1000*Fs); % position for each frame in terms of index, not time Dawn@4: frames=toframes(Y,curpos,segmentlen,'hamm'); Dawn@4: [nf framelen]=size(frames); Dawn@4: clear Y; Dawn@4: %----------------- initialize vectors for f0 time, f0 values, and SHR Dawn@4: f0_value=zeros(nf,1); Dawn@4: SHR=zeros(nf,1); Dawn@4: f0_time=f0_time(1:nf); Dawn@4: f0_candidates=zeros(nf,2); Dawn@4: %----------------- voicing determination ---------------------------- Dawn@4: if (CHECK_VOICING) Dawn@4: NoiseFloor=sum(frames(1,:).^2); Dawn@4: voicing=vda(frames,segmentduration/1000,NoiseFloor); Dawn@4: else Dawn@4: voicing=ones(nf,1); Dawn@4: end Dawn@4: %------------------- the main loop ----------------------- Dawn@4: curf0=0; Dawn@4: cur_SHR=0; Dawn@4: cur_cand1=0; Dawn@4: cur_cand2=0; Dawn@4: for n=1:nf Dawn@4: segment=frames(n,:); Dawn@4: curtime=f0_time(n); Dawn@4: if voicing(n)==0 Dawn@4: curf0=0; Dawn@4: cur_SHR=0; Dawn@4: else Dawn@4: [log_spectrum]=GetLogSpectrum(segment,fftlen,limit,logf,interp_logf); Dawn@4: [peak_index,cur_SHR,shshift,all_peak_indices]=ComputeSHR(log_spectrum,min_bin,startpos,endpos,lowerbound,upperbound,N,shift_units,SHR_Threshold); Dawn@4: if (peak_index==-1) % -1 indicates a possibly unvoiced frame, if CHECK_VOICING, set f0 to 0, otherwise uses previous value Dawn@4: if (CHECK_VOICING) Dawn@4: curf0=0; Dawn@4: cur_cand1=0; Dawn@4: cur_cand2=0; Dawn@4: end Dawn@4: Dawn@4: else Dawn@4: curf0=newfre(peak_index)*2; Dawn@4: if (curf0>maxf0) Dawn@4: curf0=curf0/2; Dawn@4: end Dawn@4: if (length(all_peak_indices)==1) Dawn@4: cur_cand1=0; Dawn@4: cur_cand2=newfre(all_peak_indices(1))*2; Dawn@4: else Dawn@4: cur_cand1=newfre(all_peak_indices(1))*2; Dawn@4: cur_cand2=newfre(all_peak_indices(2))*2; Dawn@4: end Dawn@4: if (cur_cand1>maxf0) Dawn@4: cur_cand1=cur_cand1/2; Dawn@4: end Dawn@4: if (cur_cand2>maxf0) Dawn@4: cur_cand2=cur_cand2/2; Dawn@4: end Dawn@4: if (CHECK_VOICING) Dawn@4: voicing(n)=postvda(segment,curf0,Fs); Dawn@4: if (voicing(n)==0) Dawn@4: curf0=0; Dawn@4: end Dawn@4: end Dawn@4: end Dawn@4: end Dawn@4: f0_value(n)=curf0; Dawn@4: SHR(n)=cur_SHR; Dawn@4: f0_candidates(n,1)=cur_cand1; Dawn@4: f0_candidates(n,2)=cur_cand2; Dawn@4: DEBUG=0; Dawn@4: if DEBUG Dawn@4: figure(9) Dawn@4: %subplot(5,1,1),plot(segment,'*') Dawn@4: %title('windowed waveform segment') Dawn@4: subplot(2,2,1),plot(interp_logf,log_spectrum,'k*') Dawn@4: title('(a)') Dawn@4: grid Dawn@4: %('spectrum on log frequency scale') Dawn@4: %grid Dawn@4: shsodd=sum(shshift(1:2:N-1,:),1); Dawn@4: shseven=sum(shshift(2:2:N,:),1); Dawn@4: difference=shseven-shsodd; Dawn@4: subplot(2,2,2),plot(interp_logf,shseven,'k*') Dawn@4: title('(b)') Dawn@4: %title('even') Dawn@4: grid Dawn@4: subplot(2,2,3),plot(interp_logf,shsodd,'k*') Dawn@4: title('(c)') Dawn@4: %title('odd') Dawn@4: grid Dawn@4: subplot(2,2,4), plot(interp_logf,difference,'k*') Dawn@4: title('(d)') Dawn@4: %title('difference (even-odd)') Dawn@4: grid Dawn@4: curtime Dawn@4: curf0 Dawn@4: cur_SHR Dawn@4: pause Dawn@4: end Dawn@4: end Dawn@4: %-------------- post-processing ------------------------------- Dawn@4: if (med_smooth > 0) Dawn@4: f0_value=medsmooth(f0_value,med_smooth); Dawn@4: end Dawn@4: %f0=linsmooth(f0,5); % this is really optional. Dawn@4: Dawn@4: %***************************************************************************************** Dawn@4: %-------------- do FFT and get log spectrum --------------------------------- Dawn@4: %***************************************************************************************** Dawn@4: function [interp_amplitude]=GetLogSpectrum(segment,fftlen,limit,logf,interp_logf) Dawn@4: Spectra=fft(segment,fftlen); Dawn@4: amplitude = abs(Spectra(1:fftlen/2+1)); % fftlen is always even here. Note: change fftlen/2 to fftlen/2+1. bug fixed due to Herbert Griebel Dawn@4: amplitude=amplitude(2:limit+1); % ignore the zero frequency component Dawn@4: %amplitude=log10(amplitude+1); Dawn@4: interp_amplitude=interp1(logf,amplitude,interp_logf,'linear'); Dawn@4: interp_amplitude=interp_amplitude-min(interp_amplitude); Dawn@4: %***************************************************************************************** Dawn@4: %-------------- compute subharmonic-to-harmonic ratio --------------------------------- Dawn@4: %***************************************************************************************** Dawn@4: function [peak_index,SHR,shshift,index]=ComputeSHR(log_spectrum,min_bin,startpos,endpos,lowerbound,upperbound,N,shift_units,SHR_Threshold) Dawn@4: % computeshr: compute subharmonic-to-harmonic ratio for a short-term signal Dawn@4: len_spectrum=length(log_spectrum); Dawn@4: totallen=shift_units+len_spectrum; Dawn@4: shshift=zeros(N,totallen); %initialize the subharmonic shift matrix; each row corresponds to a shift version Dawn@4: shshift(1,(totallen-len_spectrum+1):totallen)=log_spectrum; % place the spectrum at the right end of the first row Dawn@4: % note that here startpos and endpos has N-1 rows, so we start from 2 Dawn@4: % the first row in shshift is the original log spectrum Dawn@4: for i=2:N Dawn@4: shshift(i,startpos(i-1):endpos(i-1))=log_spectrum(1:endpos(i-1)-startpos(i-1)+1); % store each shifted sequence Dawn@4: end Dawn@4: shshift=shshift(:,shift_units+1:totallen); % we don't need the stuff smaller than shift_units Dawn@4: shsodd=sum(shshift(1:2:N-1,:),1); Dawn@4: shseven=sum(shshift(2:2:N,:),1); Dawn@4: difference=shseven-shsodd; Dawn@4: % peak picking process Dawn@4: SHR=0; Dawn@4: [mag,index]=twomax(difference,lowerbound,upperbound,min_bin); % only find two maxima Dawn@4: % first mag is always the maximum, the second, if there is, is the second max Dawn@4: NumPitchCandidates=length(mag); Dawn@4: if (NumPitchCandidates == 1) % this is possible, mainly due to we put a constraint on search region, i.e., f0 range Dawn@4: if (mag <=0) % this must be an unvoiced frame Dawn@4: peak_index=-1; Dawn@4: return Dawn@4: end Dawn@4: peak_index=index; Dawn@4: SHR=0; Dawn@4: else Dawn@4: SHR=(mag(1)-mag(2))/(mag(1)+mag(2)); Dawn@4: if (SHR<=SHR_Threshold) Dawn@4: peak_index=index(2); % subharmonic is weak, so favor the harmonic Dawn@4: else Dawn@4: peak_index=index(1); % subharmonic is strong, so favor the subharmonic as F0 Dawn@4: end Dawn@4: end Dawn@4: %%***************************************************************************************** Dawn@4: %****************** this function only finds two maximum peaks ************************ Dawn@4: function [mag,index]=twomax(x,lowerbound,upperbound,unitlen) Dawn@4: %In descending order, the magnitude and index are returned in [mag,index], respectively Dawn@4: lenx=length(x); Dawn@4: halfoct=round(1/unitlen/2); % compute the number of units of half octave. log2(2)=1; 1/unitlen Dawn@4: [mag,index]=max(x(lowerbound:upperbound));%find the maximum value Dawn@4: if (mag<=0) Dawn@4: % error('max is smaller than zero!') % return it! Dawn@4: return Dawn@4: end Dawn@4: index=index+lowerbound-1; Dawn@4: harmonics=2; Dawn@4: LIMIT=0.0625; % 1/8 octave Dawn@4: startpos=index+round(log2(harmonics-LIMIT)/unitlen); Dawn@4: if (startpos<=min(lenx,upperbound)) Dawn@4: endpos=index+round(log2(harmonics+LIMIT)/unitlen); % for example, 100hz-200hz is one octave, 200hz-250hz is 1/4octave Dawn@4: if (endpos> min(lenx,upperbound)) Dawn@4: endpos=min(lenx,upperbound); Dawn@4: end Dawn@4: [mag1,index1]=max(x(startpos:endpos));%find the maximum value at right side of last maximum Dawn@4: if (mag1>0) Dawn@4: index1=index1+startpos-1; Dawn@4: mag=[mag;mag1]; Dawn@4: index=[index;index1]; Dawn@4: end Dawn@4: end Dawn@4: %***************************************************************************************** Dawn@4: %%---------------------------------------------------------------------------------------- Dawn@4: %%-----------------------------------voicing determination ------------------------------- Dawn@4: function voice=vda(x,segmentdur,noisefloor,minzcr) Dawn@4: %voice=vda(x) determine whether the segment is voiced, unvoiced or silence Dawn@4: %this VDA is independent from the PDA process, and does not take advantage of the info derived from PDA Dawn@4: %thus, it requires more computation load. Dawn@4: if nargin<4 Dawn@4: %minzcr=2500; %unit: hertz Dawn@4: minzcr=3000; Dawn@4: end Dawn@4: if nargin<3 Dawn@4: noisefloor=0.01; Dawn@4: end Dawn@4: [nf, len]=size(x); Dawn@4: voice=ones(nf,1); Dawn@4: engergy=sum(x.^2,2); Dawn@4: index=find(engergy<=noisefloor*3); Dawn@4: voice(index)=0; Dawn@4: Dawn@4: %***************************************************************************************** Dawn@4: %% --------------------------------- determine the energy threshold for silence------------------------- Dawn@4: function thr=ethreshold(frames) Dawn@4: %%%%% use Rabiner and Sambur (1975) method Dawn@4: [nf,len]=size(frames); Dawn@4: lastpoint=1; Dawn@4: emax=0; Dawn@4: emin=0; Dawn@4: e=sum(frames.^2,2); Dawn@4: emax=max(e); Dawn@4: emin=min(e); Dawn@4: I1=0.03*(emax-emin)+emin; Dawn@4: I2=4*emin; Dawn@4: thr=25*min(I1,I2); Dawn@4: Dawn@4: %***************************************************************************************** Dawn@4: %% ------------------- split signal into frames --------------- Dawn@4: function frames=toframes(input,curpos,segmentlen,wintype) Dawn@4: len=length(input); Dawn@4: numFrames=length(curpos); Dawn@4: frames=zeros(numFrames,segmentlen); Dawn@4: start=curpos-round(segmentlen/2); Dawn@4: offset=(0:segmentlen-1); Dawn@4: index_start=find(start<1); % find out those frames beyond the first point Dawn@4: start(index_start)=1; % for those, just use the first frame Dawn@4: endpos=start+segmentlen-1; Dawn@4: index=find(endpos>len); Dawn@4: endpos(index)=len; % duplicate the last several frames if window is over the limit. Dawn@4: start(index)=len+1-segmentlen; Dawn@4: frames(:)=input(start(:,ones(1,segmentlen))+offset(ones(numFrames,1),:)); Dawn@4: [nf, len]=size(frames); Dawn@4: win=window(segmentlen,wintype); Dawn@4: frames = frames .* win(ones(nf,1),:); Dawn@4: %***************************************************************************************** Dawn@4: %-------------- post voicing checking --------------------------------------------- Dawn@4: function voicing=postvda(segment, curf0,Fs,r_threshold) Dawn@4: %%% check voicing again using estimated F0, which follows Hermes, SHS algorithm, JASA, 1988 Dawn@4: if nargin<4 Dawn@4: r_threshold=0.2; Dawn@4: end Dawn@4: estimated_period=1/curf0; Dawn@4: mid_point=round(length(segment)/2); Dawn@4: num_points=round(estimated_period*Fs); % number of points in each period Dawn@4: start_point=mid_point-num_points; Dawn@4: end_point=mid_point+num_points; Dawn@4: if (start_point <1) Dawn@4: start_point=1; Dawn@4: mid_point=start_point+num_points; Dawn@4: if (mid_point>length(segment)) % this is unreasonable, set f0 to zero Dawn@4: voicing=0; Dawn@4: return; Dawn@4: end Dawn@4: end Dawn@4: segment1=segment(start_point:mid_point); Dawn@4: if (end_point>length(segment)) Dawn@4: end_point=length(segment); Dawn@4: mid_point=end_point-num_points; Dawn@4: if (mid_point<1) % this is unreasonable, set f0 to zero Dawn@4: voicing=0; Dawn@4: return; Dawn@4: end Dawn@4: end Dawn@4: segment2=segment(mid_point:end_point); Dawn@4: len=min(length(segment1),length(segment2)); Dawn@4: r=corrcoef(segment1(1:len),segment2(1:len)); Dawn@4: r1=r(1,2); Dawn@4: if (r1