THE AUDITORY MODELING TOOLBOX

Applies to version: 0.9.5

View the code

Go to function

DIETZ2011 - Dietz 2011 binaural model

Usage

[...] = dietz(insig,fs);

Input parameters

insig binaural signal for which values should be calculated
fs sampling rate (Hz)

Output parameters

fine Information about the fine structure (see below)
env Information about the envelope (see below)

Description

dietz2011(insig,fs) calculates interaural phase, time and level differences of fine- structure and envelope of the signal, as well as the interaural coherence, which can be used as a weighting function.

The output structures fine and env have the following fields:

.s1 Left signal as put in the binaural processor
.s2 Right signal as put in the binaural processor
.fc Center frequencies of the channels (f_carrier or f_mod)
.itf Transfer function
.itf_equal Transfer function without amplitude
.ipd Phase difference in rad
.ipd_lp Based on lowpass-filtered itf, phase difference in rad
.ild Level difference in dB
.itd Time difference based on instantaneous frequencies
.itd_C Time difference based on central frequencies
.itd_lp As .itd, with low-passed itf
.itd_C_lp As .itd_C, with low-passed itf
.f_inst_1 Instantaneous frequencies in the channels of the filtered s1
.f_inst_2 Instantaneous frequencies in the channels of the filtered s2
.f_inst Instantaneous frequencies (average of f_inst1 and 2)

The steps of the binaural model to calculate the result are the following (see also Dietz et al., 2011):

  1. Middle ear filtering (500-2000 Hz 1st order bandpass)
  2. Auditory bandpass filtering on the basilar membrane using a 4th-order all-pole gammatone filterbank, employing 23 filter bands between 200 and 5000 Hz, with a 1 ERB spacing. The filter width was set to correspond to 1 ERB.
  3. Cochlear compression was simulated by power-law compression with an exponent of 0.4.
  4. The transduction process in the inner hair cells was modelled using half-wave rectification followed by filtering with a 770-Hz 5th order lowpass.

The interaural temporal disparities are then extracted using a second-order complex gammatone bandpass (see paper for details).

dietz2011 accepts the following optional parameters:

'flow',flow Set the lowest frequency in the filterbank to flow. Default value is 200 Hz.
'fhigh',fhigh Set the highest frequency in the filterbank to fhigh. Default value is 5000 Hz.
'basef',basef Ensure that the frequency basef is a center frequency in the filterbank. The default value is 1000.
'filters_per_ERB',filters_per_erb
 Filters per erb. The default value is 1.
'middle_ear_thr',r Bandpass freqencies for middle ear transfer. The default value is [500 2000].
'middle_ear_order',n
 Order of middle ear filter. Only even numbers are possible. The default value is 2.
'haircell_lp_freq',hlpfreq
 Cutoff frequency for haircell lowpass filter. The default value is 770.
'haircell_lp_order',hlporder
 Order of haircell lowpass filter. The default value is 5.
'compression_power',cpwr
 
  1. The default value is 0.4.
'alpha',alpha Internal noise strength. Convention FIXME 65dB = 0.0354. The default value is 0.
'int_randn' Internal noise XXX. This is the default.
'int_mini' Internal noise XXX.
'filter_order',fo Filter order for output XXX. Used for both 'mod' and 'fine'. The default value is 2.
'filter_attenuation_db',fadb
 
  1. Used for both 'mod' and 'fine'. The default value is 10.
'fine_filter_finesse',fff
 Only for finestructure plugin. The default value is 3.
'mod_center_frequency_hz',mcf_hz
 
  1. Only for envelope plugin. The default value is 135.
'mod_filter_finesse',mff
 
  1. Only for envelope plugin. The default value is 8.
'level_filter_cutoff_hz',lfc_hz
 
  1. For ild- or level-plugin. The default value is 30.
'level_filter_order',lforder
 
  1. For ild- or level-plugin. The default value is 2.
'coh_param',coh_param
 

This is a structure used for the localization plugin. It has the following fields:

max_abs_itd
  1. The default value is 1e-3.
tau_cycles
  1. The default value is 5.
tau_s
  1. The default value is 10e-3.
'signal_level_dB_SPL',signal_level
 Sound pressure level of left channel. Used for data display and analysis. Default value is 70.

References:

M. Dietz, S. D. Ewert, and V. Hohmann. Auditory model based direction estimation of concurrent speakers from binaural signals. Speech Communication, 53(5):592-605, 2011. [ DOI | http ]