Independent Vector Analysis (AuxIVA)

AuxIVA

Blind Source Separation using independent vector analysis based on auxiliary function. This function will separate the input signal into statistically independent sources without using any prior information.

The algorithm in the determined case, i.e., when the number of sources is equal to the number of microphones, is AuxIVA 1. When there are more microphones (the overdetermined case), a computationaly cheaper variant (OverIVA) is used 2.

Example

from scipy.io import wavfile
import pyroomacoustics as pra

# read multichannel wav file
# audio.shape == (nsamples, nchannels)
fs, audio = wavfile.read("my_multichannel_audio.wav")

# STFT analysis parameters
fft_size = 4096  # `fft_size / fs` should be ~RT60
hop == fft_size // 2  # half-overlap
win_a = pra.hann(fft_size)  # analysis window
# optimal synthesis window
win_s = pra.transform.compute_synthesis_window(win_a, hop)

# STFT
# X.shape == (nframes, nfrequencies, nchannels)
X = pra.transform.analysis(audio, fft_size, hop, win=win_a)

# Separation
Y = pra.bss.auxiva(X, n_iter=20)

# iSTFT (introduces an offset of `hop` samples)
# y contains the time domain separated signals
# y.shape == (new_nsamples, nchannels)
y = pra.transform.synthesis(Y, fft_size, hop, win=win_s)

References

1

N. Ono, Stable and fast update rules for independent vector analysis based on auxiliary function technique, Proc. IEEE, WASPAA, pp. 189-192, Oct. 2011.

2

R. Scheibler and N. Ono, Independent Vector Analysis with more Microphones than Sources, arXiv, 2019. https://arxiv.org/abs/1905.07880

pyroomacoustics.bss.auxiva.auxiva(X, n_src=None, n_iter=20, proj_back=True, W0=None, model='laplace', init_eig=False, return_filters=False, callback=None)

This is an implementation of AuxIVA/OverIVA that separates the input signal into statistically independent sources. The separation is done in the time-frequency domain and the FFT length should be approximately equal to the reverberation time.

Two different statistical models (Laplace or time-varying Gauss) can be used by using the keyword argument model. The performance of Gauss model is higher in good conditions (few sources, low noise), but Laplace (the default) is more robust in general.

Parameters
  • X (ndarray (nframes, nfrequencies, nchannels)) – STFT representation of the signal

  • n_src (int, optional) – The number of sources or independent components. When n_src==nchannels, the algorithms is identical to AuxIVA. When n_src==1, then it is doing independent vector extraction.

  • n_iter (int, optional) – The number of iterations (default 20)

  • proj_back (bool, optional) – Scaling on first mic by back projection (default True)

  • W0 (ndarray (nfrequencies, nsrc, nchannels), optional) – Initial value for demixing matrix

  • model (str) – The model of source distribution ‘gauss’ or ‘laplace’ (default)

  • init_eig (bool, optional (default False)) – If True, and if W0 is None, then the weights are initialized using the principal eigenvectors of the covariance matrix of the input data. When False, the demixing matrices are initialized with identity matrix.

  • return_filters (bool) – If true, the function will return the demixing matrix too

  • callback (func) – A callback function called every 10 iterations, allows to monitor convergence

Returns

  • Returns an (nframes, nfrequencies, nsources) array. Also returns

  • the demixing matrix (nfrequencies, nchannels, nsources)

  • if return_values keyword is True.