THE AUDITORY MODELING TOOLBOX

Applies to version: 1.1.0

View the code

Go to function

MCKENZIE2021 - Binaural perceptual similarity

Input parameters

A Reference input data matrix
B Test input data matrix
domFlag Specifies whether input data is in time (0) / frequency (1) / frequency_dB (2) domain
f struct, if (domFlag == 0) contains fs, nfft, minFreq, maxFreq; if (domFlag == 1 or 2) Specifies the FFT sample frequencies

Description

Optional input parameters:

'norm' if empty (DEFAULT), Iterate to find optimal normalisation else, apply norm dB or normalisation to input B
'w' (DEFAULT=1) Sample point weightings
'plotFlag' (DEFAULT=0) Don't show (0) or show (1) plot normalisation curve
'lim' (DEFAULT=0.05) if lim >= 1 it specififies the number of normalisation iterations. If lim < 1 it specifies the resolution/change in PSD that must be reached by sequential normalisation steps i.e. iterations will stop once they result in changes to the PSD value that are less than lim (or 100000 iterations)
'SPL' (DEFAULT=75) the average dB SPL value at which comparisons are made
'initInc' (DEFAULT=0.2) The starting offset increment
'scale' (DEFAULT=0.4) The absolute scaling value by which the offset increment is adjusted each time an increase in PSD in detected

This function compares the spectra of A and B in terms of PERCEPTUALLY WEIGHTED error. For multi-dimensional inputs the comparison is made along the first dimension. Averages are output for each column of data. Perceptual error takes into account the lesser importance of quieter sounds and less sensitive frequencies. Frequency bins are weighted with respect to the ISO 226 loudness curves for an average listening level of 75 dB SPL by default. A contribution half as loud is deemed half as important using the Sones scale. The perceptual average difference is further weighted with respect to ERB bandwidth. Simply, this reduces the contribution to the average calculation of higher frequency components where our ears are less sensitive. It's like a logarithmic type average. An iterative optimisation process is used to find the input normalisation which results in the lowest error metric. This is generally somewhere around the point at which the two input signals have the same mean value - but can easily vary by a few dB. The optimum normalisation is different for perceptual / absolute error metrics.

Authors: Thomas McKenzie, Cal Armstrong, Lauren Ward, Damian Murphy, Gavin Kearney Correspondence to thomas.mckenzie@aalto.fi (happy to answer any questions if you're having trouble!)

References:

T. McKenzie, C. Armstrong, L. Ward, D. T. Murphy, and G. Kearney. A perceptually motivated spectral difference model for binaural signals. Acta Acustica, 2021.