THE AUDITORY MODELING TOOLBOX

This documentation page applies to an outdated AMT version (1.2.0). Click here for the most recent page.

View the code

Go to function

JOERGENSEN2013 - Speech-based envelope power spectrum (multi-resolution EPSM)

Usage

output = joergensen2013(x, y, fs, IO_param)

Input parameters

x noisy speech mixture
y noise alone
fs sample rate in Hz
IO_param (optional) vector with parameters for the ideal observer that converts the SNRenv to probability of correct, assuming a given speech material. It contains four parameters of the ideal observer formatted as [k q m sigma_s].

Output parameters

output

output structure The output structure has the following fields:

  • .SNRenv : The SNRenv
  • .P_correct : The probability of correct given the SNRenv. This field is only included if IO_param is specified. Its calculation requires the Statistics ToolBox.

Description

output = joergensen2013(x, y, fs, IO_param) calculates the signal-to-noise envelope-power.

Model parameter (Note: T_int (ms) should be a multiple of 1000/f == 2) Begin of the storage of the cross-correlation is set to 1, because we have a non-stationar (SNRenv) ratio using the multi-resolution speech-based envelope spectrum model (mr-sEPSM) described in Joergensen et al. (2013). The main difference between to the Joergensen et al. (2011) model is that the present model estimates the envelope power using multi-resolution segmentation of the envelope. The segment duration depends on the modulation filter center-frequency. In addition, the modulation filter bank includes filters up to modulation frequencies of 256 Hz in contrast to the 64 Hz considered by the model from Joergensen et al. (2011).

The model is based on the model from Joergensen et al. (2011), which consists of the following stages:

1 A gammatone bandpass filterbank to simulate the auditory filters

2 An envelope extraction stage via the Hilbert Transform

3 A modulation filterbank

4 Computation of the long-term envelope power (output.SNRenv)

5 A decision mechanism based on a statistically ideal observer (output.P_correct)

References:

S. Joergensen and T. Dau. Predicting speech intelligibility based on the signal-to-noise envelope power ratio after modulation-frequency selective processing. J. Acoust. Soc. Am., 130(3):1475--1487, 2011.

S. Jørgensen, S. D. Ewert, and T. Dau. A multi-resolution envelope power based model for speech intelligibility. J. Acoust. Soc. Am., 134(1):436--446, 2013.