Independent Vector Analysis (AuxIVA)¶
AuxIVA¶
Blind Source Separation using independent vector analysis based on auxiliary function. This function will separate the input signal into statistically independent sources without using any prior information.
The algorithm in the determined case, i.e., when the number of sources is equal to the number of microphones, is AuxIVA 1. When there are more microphones (the overdetermined case), a computationaly cheaper variant (OverIVA) is used 2.
Example
from scipy.io import wavfile
import pyroomacoustics as pra
# read multichannel wav file
# audio.shape == (nsamples, nchannels)
fs, audio = wavfile.read("my_multichannel_audio.wav")
# STFT analysis parameters
fft_size = 4096 # `fft_size / fs` should be ~RT60
hop == fft_size // 2 # half-overlap
win_a = pra.hann(fft_size) # analysis window
# optimal synthesis window
win_s = pra.transform.compute_synthesis_window(win_a, hop)
# STFT
# X.shape == (nframes, nfrequencies, nchannels)
X = pra.transform.analysis(audio, fft_size, hop, win=win_a)
# Separation
Y = pra.bss.auxiva(X, n_iter=20)
# iSTFT (introduces an offset of `hop` samples)
# y contains the time domain separated signals
# y.shape == (new_nsamples, nchannels)
y = pra.transform.synthesis(Y, fft_size, hop, win=win_s)
References
- 1
N. Ono, Stable and fast update rules for independent vector analysis based on auxiliary function technique, Proc. IEEE, WASPAA, pp. 189-192, Oct. 2011.
- 2
R. Scheibler and N. Ono, Independent Vector Analysis with more Microphones than Sources, arXiv, 2019. https://arxiv.org/abs/1905.07880
- pyroomacoustics.bss.auxiva.auxiva(X, n_src=None, n_iter=20, proj_back=True, W0=None, model='laplace', init_eig=False, return_filters=False, callback=None)¶
This is an implementation of AuxIVA/OverIVA that separates the input signal into statistically independent sources. The separation is done in the time-frequency domain and the FFT length should be approximately equal to the reverberation time.
Two different statistical models (Laplace or time-varying Gauss) can be used by using the keyword argument model. The performance of Gauss model is higher in good conditions (few sources, low noise), but Laplace (the default) is more robust in general.
- Parameters
X (ndarray (nframes, nfrequencies, nchannels)) – STFT representation of the signal
n_src (int, optional) – The number of sources or independent components. When
n_src==nchannels
, the algorithms is identical to AuxIVA. Whenn_src==1
, then it is doing independent vector extraction.n_iter (int, optional) – The number of iterations (default 20)
proj_back (bool, optional) – Scaling on first mic by back projection (default True)
W0 (ndarray (nfrequencies, nsrc, nchannels), optional) – Initial value for demixing matrix
model (str) – The model of source distribution ‘gauss’ or ‘laplace’ (default)
init_eig (bool, optional (default
False
)) – IfTrue
, and ifW0 is None
, then the weights are initialized using the principal eigenvectors of the covariance matrix of the input data. WhenFalse
, the demixing matrices are initialized with identity matrix.return_filters (bool) – If true, the function will return the demixing matrix too
callback (func) – A callback function called every 10 iterations, allows to monitor convergence
- Returns
Returns an (nframes, nfrequencies, nsources) array. Also returns
the demixing matrix (nfrequencies, nchannels, nsources)
if
return_values
keyword is True.