Common Tools

pyroomacoustics.bss.common.projection_back(Y, ref, clip_up=None, clip_down=None)

This function computes the frequency-domain filter that minimizes the squared error to a reference signal. This is commonly used to solve the scale ambiguity in BSS.

Here is the derivation of the projection. The optimal filter z minimizes the squared error.

\[\min E[|z^* y - x|^2]\]

It should thus satsify the orthogonality condition and can be derived as follows

\[ \begin{align}\begin{aligned}\begin{split}0 & = E[y^*\\, (z^* y - x)]\end{split}\\\begin{split}0 & = z^*\\, E[|y|^2] - E[y^* x]\end{split}\\\begin{split}z^* & = \\frac{E[y^* x]}{E[|y|^2]}\end{split}\\\begin{split}z & = \\frac{E[y x^*]}{E[|y|^2]}\end{split}\end{aligned}\end{align} \]

In practice, the expectations are replaced by the sample mean.

Parameters
  • Y (array_like (n_frames, n_bins, n_channels)) – The STFT data to project back on the reference signal

  • ref (array_like (n_frames, n_bins)) – The reference signal

  • clip_up (float, optional) – Limits the maximum value of the gain (default no limit)

  • clip_down (float, optional) – Limits the minimum value of the gain (default no limit)

pyroomacoustics.bss.common.sparir(G, S, weights=<MagicMock name='mock()' id='140583673925776'>, gini=0, maxiter=50, tol=10, alpha=10, alphamax=100000.0, alphamin=1e-07)

Fast proximal algorithm implementation for sparse approximation of relative impulse responses from incomplete measurements of the corresponding relative transfer function based on

Z. Koldovsky, J. Malek, and S. Gannot, “Spatial Source Subtraction based on Incomplete Measurements of Relative Transfer Function”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, TASLP 2015.

The original Matlab implementation can be found at http://itakura.ite.tul.cz/zbynek/dwnld/SpaRIR.m

and it is referred in

Z. Koldovsky, F. Nesta, P. Tichavsky, and N. Ono, Frequency-domain blind speech separation using incomplete de-mixing transform, EUSIPCO 2016.

Parameters
  • G (ndarray (nfrequencies, 1)) – Frequency representation of the (incomplete) transfer function

  • S (ndarray (kfrequencies)) – Indexes of active frequency bins for sparse AuxIVA

  • weights (ndarray (kfrequencies) or int, optional) – The higher the value of weights(i), the higher the probability that g(i) is zero; if scalar, all weights are the same; if empty, default value is used

  • gini (ndarray (nfrequencies)) – Initialization for the computation of g

  • maxiter (int) – Maximum number of iterations before achieving convergence (default 50)

  • tol (float) – Minimum convergence criteria based on the gradient difference between adjacent updates (default 10)

  • alpha (float) – Inverse of the decreasing speed of the gradient at each iteration. This parameter is updated at every iteration (default 10)

  • alphamax (float) – Upper bound for alpha (default 1e5)

  • alphamin (float) – Lower bound for alpha (default 1e-7)

Returns

  • Returns the sparse approximation of the impulse response in the

  • time-domain (real-valued) as an (nfrequencies) array.