THE AUDITORY MODELING TOOLBOX

Applies to version: 1.6.0

View the code

Go to function

may2011
Azimuthal localization of concurrent talkers

Usage:

out = may2011(input,fs);

Input parameters:

input Binaural audio signal. Size: (time ear).
fs Sampling frequency (in Hz).

Output parameters:

out

Structure containing the results:

  • param: Processing parameters.
  • azFrames: Vector with the frame-based azimuth estimates. Size: nFrames.
  • azimuth: Matrix with the time-frequency azimuth estimates. Size: (nFilter x nFrames).
  • rangeAZ: Vector with the azimuth grid. Size: nAzDir.
  • prob: Matrix with the 3D probability map. Size: (nFilter x nFrames x nAzDir).
  • loglik: Matrix with the 3D log-likelihood map. Size: (nFilter x nFrames x nAzDir).
  • itd: Matrix with the interaural time differences. Size: (nFilter x nFrames).
  • ild: Matrix with the interaural level differences. Size (nFilter x nFrames).
  • ic: Matrix with the interaural coherences. Size (nFilter x nFrames).

Description:

may2011 is a probabilistic model of azimuthal direction estimation with concurrent talkers. Note that in the current implementation, all model parameters are hard-coded in the auxdata modeldata.mat.

nFrames is the approximate number of 10-ms segments in input.

nFilter is 32 unless otherwise specified in the code.

nAzDir is 37 unless otherwise specified in the code.

References:

T. May, S. van de Par, and A. Kohlrausch. A probabilistic model for robust localization based on a binaural auditory front-end. IEEE Trans Audio Speech Lang Proc, 19:1--13, 2011.

N. Roman, D. L. Wang, and G. J. Brown. Speech segregation based on sound localization. The Journal of the Acoustical Society of America, 114(4):2236--2252, 2003.