pyroomacoustics.denoise.iterative_wiener module

class pyroomacoustics.denoise.iterative_wiener.IterativeWiener(frame_len, lpc_order, iterations, alpha=0.8, thresh=0.01)

Bases: object

A class for performing single channel noise reduction in the frequency domain with a Wiener filter that is iteratively computed. This implementation is based off of the approach presented in:

J. Lim and A. Oppenheim, All-Pole Modeling of Degraded Speech, IEEE Transactions on Acoustics, Speech, and Signal Processing 26.3 (1978): 197-210.

For each frame, a Wiener filter of the following form is computed and applied to the noisy samples in the frequency domain:

\[\begin{split}H(\\omega) = \dfrac{P_S(\\omega)}{P_S(\\omega) + \\sigma_d^2},\end{split}\]

where \(P_S(\omega)\) is the speech power spectral density and \(\sigma_d^2\) is the noise variance.

The following assumptions are made in order to arrive at the above filter as the optimal solution:

  • The noisy samples \(y[n]\) can be written as:

    \[y[n] = s[n] + d[n],\]

    where \(s[n]\) is the desired signal and \(d[n]\) is the background noise.

  • The signal and noise are uncorrelated.

  • The noise is white Gaussian, i.e. it has a flat power spectrum with amplitude \(\sigma_d^2\).

Under these assumptions, the above Wiener filter minimizes the mean-square error between the true samples \(s[0:N-1]\) and the estimated one \(\hat{s[0:N-1]}\) by filtering \(y[0:N-1]\) with the above filter (with \(N\) being the frame length).

The fundamental part of this approach is correctly (or as well as possible) estimating the speech power spectral density \(P_S(\omega)\) and the noise variance \(\sigma_d^2\). For this, we need a voice activity detector in order to determine when we have incoming speech. In this implementation, we use a simple energy threshold on the input frame, which is set with the thresh input parameter.

When no speech is identified, the input frame is used to update the noise variance \(\sigma_d^2\). We could simply set \(\sigma_d^2\) to the energy of the input frame. However, we employ a simple IIR filter in order to avoid abrupt changes in the noise level (thus adding an assumption of stationary):

\[\begin{split}\sigma_d^2[k] = \\alpha \cdot \sigma_d^2[k-1] + (1-\\alpha) \cdot \sigma_y^2,\end{split}\]

where \(\\alpha\) is the smoothing parameter and \(\sigma_y^2\) is the energy of the input frame. A high value of \(\\alpha\) will update the noise level very slowly, while a low value will make it very sensitive to changes at the input. The value for \(\\alpha\) can be set with the alpha parameter.

When speech is identified in the input frame, an iterative procedure is employed in order to estimate \(P_S(\omega)\) (and therefore the Wiener filter \(H\) as well). This procedure consists of computing \(p\) linear predictive coding (LPC) coefficients of the input frame. The number of LPC coefficients is set with the parameter lpc_order. These LPC coefficients form an all-pole filter that models the vocal tract as described in the above paper (Eq. 1). With these coefficients, we can then obtain an estimate of the speech power spectral density (Eq. 41b) and thus the corresponding Wiener filter (Eq. 41a). This Wiener filter is used to denoise the input frame. Moreover, with this denoised frame, we can compute new LPC coefficients and therefore a new Wiener filter. The idea behind this approach is that by iteratively computing the LPC coefficients as such, we can obtain a better estimate of the speech power spectral density. The number of iterations can be set with the iterations parameter.

Below is an example of how to use this class to emulate a streaming/online input. A full example can be found here.

# initialize STFT and IterativeWiener objects
nfft = 512
stft = pra.transform.STFT(nfft, hop=nfft//2,
                          analysis_window=pra.hann(nfft))
scnr = IterativeWiener(frame_len=nfft, lpc_order=20, iterations=2,
                       alpha=0.8, thresh=0.01)

# apply block-by-block
for n in range(num_blocks):

    # go to frequency domain, 50% overlap
    stft.analysis(mono_noisy)

    # compute wiener output
    X = scnr.compute_filtered_output(
            current_frame=stft.fft_in_buffer,
            frame_dft=stft.X)

    # back to time domain
    mono_denoised = stft.synthesis(X)

There also exists a “one-shot” function.

# import or create `noisy_signal`
denoised_signal = apply_iterative_wiener(noisy_signal, frame_len=512,
                                         lpc_order=20, iterations=2,
                                         alpha=0.8, thresh=0.01)
Parameters
  • frame_len (int) – Frame length in samples.

  • lpc_order (int) – Number of LPC coefficients to compute

  • iterations (int) – How many iterations to perform in updating the Wiener filter for each signal frame.

  • alpha (int) – Smoothing factor within [0,1] for updating noise level. Closer to 1 gives more weight to the previous noise level, while closer to 0 gives more weight to the current frame’s level. Closer to 0 can track more rapid changes in the noise level. However, if a speech frame is incorrectly identified as noise, you can end up removing desired speech.

  • thresh (float) – Threshold to distinguish between (signal+noise) and (noise) frames. A high value will classify more frames as noise but might remove desired signal!

compute_filtered_output(current_frame, frame_dft=None)

Compute Wiener filter in the frequency domain.

Parameters
  • current_frame (numpy array) – Noisy samples.

  • frame_dft (numpy array) – DFT of input samples. If not provided, it will be computed.

Returns

Output of denoising in the frequency domain.

Return type

numpy array

pyroomacoustics.denoise.iterative_wiener.apply_iterative_wiener(noisy_signal, frame_len=512, lpc_order=20, iterations=2, alpha=0.8, thresh=0.01)

One-shot function to apply iterative Wiener filtering for denoising.

Parameters
  • noisy_signal (numpy array) – Real signal in time domain.

  • frame_len (int) – Frame length in samples. 50% overlap is used with hanning window.

  • lpc_order (int) – Number of LPC coefficients to compute

  • iterations (int) – How many iterations to perform in updating the Wiener filter for each signal frame.

  • alpha (int) – Smoothing factor within [0,1] for updating noise level. Closer to 1 gives more weight to the previous noise level, while closer to 0 gives more weight to the current frame’s level. Closer to 0 can track more rapid changes in the noise level. However, if a speech frame is incorrectly identified as noise, you can end up removing desired speech.

  • thresh (float) – Threshold to distinguish between (signal+noise) and (noise) frames. A high value will classify more frames as noise but might remove desired signal!

Returns

Enhanced/denoised signal.

Return type

numpy array

pyroomacoustics.denoise.iterative_wiener.compute_speech_psd(a, g2, nfft)

Compute power spectral density of speech as specified in equation (41b) of

J. Lim and A. Oppenheim, All-Pole Modeling of Degraded Speech, IEEE Transactions on Acoustics, Speech, and Signal Processing 26.3 (1978): 197-210.

Namely:

\[\begin{split}P_S(\\omega) = \dfrac{g^2}{\\left \| 1 - \sum_{k=1}^p a_k \cdot e^{-jk\omega} \\right \|^2},\end{split}\]

where \(p\) is the LPC order, \(a_k\) are the LPC coefficients, and \(g\) is an estimated gain factor.

The power spectral density is computed at the frequencies corresponding to a DFT of length nfft.

Parameters
  • a (numpy array) – LPC coefficients.

  • g2 (float) – Squared gain.

  • nfft (int) – FFT length.

Returns

Power spectral density from LPC coefficients.

Return type

numpy array

pyroomacoustics.denoise.iterative_wiener.compute_squared_gain(a, noise_psd, y)

Estimate the squared gain of the speech power spectral density as done on p. 204 of

J. Lim and A. Oppenheim, All-Pole Modeling of Degraded Speech, IEEE Transactions on Acoustics, Speech, and Signal Processing 26.3 (1978): 197-210.

Namely solving for \(g^2\) such that the following expression is satisfied:

\[\begin{split}\dfrac{N}{2\pi} \int_{-\pi}^{\pi} \dfrac{g^2}{\\left \| 1 - \sum_{k=1}^p a_k \cdot e^{-jk\omega} \\right \|^2} d\omega = \sum_{n=0}^{N-1} y^2(n) - N\cdot\sigma_d^2,\end{split}\]

where \(N\) is the number of noisy samples \(y\), \(a_k\) are the \(p\) LPC coefficients, and \(\sigma_d^2\) is the noise variance.

Parameters
  • a (numpy array) – LPC coefficients.

  • noise_psd (float or numpy array) – Noise variance if white noise, numpy array otherwise.

  • y (numpy array) – Noisy time domain samples.

Returns

Squared gain.

Return type

float

pyroomacoustics.denoise.iterative_wiener.compute_wiener_filter(speech_psd, noise_psd)

Compute Wiener filter in the frequency domain.

Parameters
  • speech_psd (numpy array) – Speech power spectral density.

  • noise_psd (float or numpy array) – Noise variance if white noise, numpy array otherwise.

Returns

Frequency domain filter, computed at the same frequency values as speech_psd.

Return type

numpy array