This documentation page applies to an outdated AMT version (1.3.0). Click here for the most recent page.
d = taal2011(sigclean, sigproc, fs);
sigclean | clean speech signal |
sigproc | processed speech signal |
fs | sampling frequency |
d | short-time speech intelligibility index |
d = stoi(sigclean, sigproc, fs) returns the output of the Short-Time Objective Intelligibility (STOI) measure described in Taal et. al. (2010) & (2011), where sigclean and sigproc denote the clean and processed speech, respectively, with sample rate fs measured in Hz. The output d is expected to have a monotonic relation with the subjective speech-intelligibility, where a higher d denotes better intelligible speech. See Taal et. al. (2010) & (2011) for more details.
The model consists of the following stages:
The following example shows a simple comparison between the intelligibility of a noisy speech signal and the same signal after noise reduction using a simple soft thresholding (spectral subtraction):
% Get a clean and noisy test signal [f,fs]=cocktailparty; Ls=length(f); f_noisy=f+0.05*pinknoise(Ls,1); % Simple spectral subtraction to remove the noise a=128; M=256; g=gabtight('hann',a,M); c_noise = dgtreal(f,g,a,M); c_removed = thresh(c_noise,0.01); f_removed = idgtreal(c_removed,g,a,M); f_removed = f_removed(1:Ls); % Compute the STOI of noisy vs. removed d_noisy = taal2011(f, f_noisy, fs) d_removed = taal2011(f, f_removed, fs)
This code produces the following output:
d_noisy = 1.0000 d_removed = 0.9915
This is a standalone version not depending on LTFAT and AMToolbox, and licensed under a different license, but the models are functionally equivalent.
C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen. A Short-Time Objective Intelligibility Measure for Time-Frequency Weighted Noisy Speech. In Acoustics Speech and Signal Processing (ICASSP), pages 4214--4217. IEEE, 2010.
C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen. An Algorithm for Intelligibility Prediction of Time-Frequency Weighted Noisy Speech. IEEE Transactions on Audio, Speech and Language Processing, 19(7):2125--2136, 2011.