This documentation page applies to an outdated major AMT version. We show it for archival purposes only.
Click here for the documentation menu and here to download the latest AMT (1.6.0).
d = taal2011(sigclean, sigproc, fs);
d = stoi(sigclean, sigproc, fs) returns the output of the Short-Time Objective Intelligibility (STOI) measure described in Taal et. al. (2010) & (2011), where sigclean and sigproc denote the clean and processed speech, respectively, with sample rate fs measured in Hz. The output d is expected to have a monotonic relation with the subjective speech-intelligibility, where a higher d denotes better intelligible speech. See Taal et. al. (2010) & (2011) for more details.
The model consists of the following stages:
The following example shows a simple comparison between the intelligibility of a noisy speech signal and the same signal after noise reduction using a simple soft thresholding (spectral subtraction):
% Get a clean and noisy test signal [f,fs]=cocktailparty; Ls=length(f); f_noisy=f+0.05*pinknoise(Ls,1); % Simple spectral subtraction to remove the noise a=128; M=256; g=gabtight('hann',a,M); c_noise = dgtreal(f,g,a,M); c_removed = thresh(c_noise,0.01); f_removed = idgtreal(c_removed,g,a,M); f_removed = f_removed(1:Ls); % Compute the STOI of noisy vs. removed d_noisy = taal2011(f, f_noisy, fs) d_removed = taal2011(f, f_removed, fs)
This code produces the following output:
d_noisy = 1.0000 d_removed = 0.9915
The original STOI model can be downloaded from http://msp.ewi.tudelft.nl/content/short-time-objective-intelligibility-measure This is a standalone version not depending on LTFAT and AMToolbox, and licensed under a different license, but the models are functionally equivalent.
C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen. A Short-Time Objective Intelligibility Measure for Time-Frequency Weighted Noisy Speech. In Acoustics Speech and Signal Processing (ICASSP), pages 4214-4217. IEEE, 2010.
C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen. An Algorithm for Intelligibility Prediction of Time-Frequency Weighted Noisy Speech. IEEE Transactions on Audio, Speech and Language Processing, 19(7):2125-2136, 2011.