STFT

Class for real-time STFT analysis and processing.

class pyroomacoustics.transform.stft.STFT(N, hop=None, analysis_window=None, synthesis_window=None, channels=1, transform='numpy', streaming=True, precision='double', **kwargs)

Bases: object

A class for STFT processing.

Parameters
  • N (int) – number of samples per frame

  • hop (int) – hop size

  • analysis_window (numpy array) – window applied to block before analysis

  • synthesis_window (numpy array) – window applied to the block before synthesis

  • channels (int) – number of signals

  • transform (str, optional) – which FFT package to use: ‘numpy’ (default), ‘pyfftw’, or ‘mkl’

  • streaming (bool, optional) – whether (True, default) or not (False) to “stitch” samples between repeated calls of ‘analysis’ and ‘synthesis’ if we are receiving a continuous stream of samples.

  • num_frames (int, optional) –

    Number of frames to be processed. If set, this will be strictly enforced as the STFT block will allocate memory accordingly. If not set, there will be no check on the number of frames sent to analysis/process/synthesis

    NOTE:

    1) num_frames = 0, corresponds to a “real-time” case in which each input block corresponds to [hop] samples. 2) num_frames > 0, requires [(num_frames-1)*hop + N] samples as the last frame must contain [N] samples.

  • precision (string, np.float32, np.float64, np.complex64, np.complex128, optional) – How many precision bits to use for the input. If ‘single’/np.float32/np.complex64, 32 bits for real inputs or 64 for complex spectrum. Otherwise, cast to 64 bits for real inputs or 128 for complex spectrum (default).

analysis(x)
Parameters

x (2D numpy array, [samples, channels]) – Time-domain signal.

process(X=None)
Parameters

X (numpy array) – X can take on multiple shapes: 1) (N,) if it is single channel and only one frame 2) (N,D) if it is multi-channel and only one frame 3) (F,N) if it is single channel but multiple frames 4) (F,N,D) if it is multi-channel and multiple frames

Returns

x_r – Reconstructed time-domain signal.

Return type

numpy array

reset()

Reset state variables. Necessary after changing or setting the filter or zero padding.

set_filter(coeff, zb=None, zf=None, freq=False)

Set time-domain FIR filter with appropriate zero-padding. Frequency spectrum of the filter is computed and set for the object. There is also a check for sufficient zero-padding.

Parameters
  • coeff (numpy array) – Filter in time domain.

  • zb (int) – Amount of zero-padding added to back/end of frame.

  • zf (int) – Amount of zero-padding added to front/beginning of frame.

  • freq (bool) – Whether or not given coefficients (coeff) are in the frequency domain.

synthesis(X=None)
Parameters

X (numpy array of frequency content) – X can take on multiple shapes: 1) (N,) if it is single channel and only one frame 2) (N,D) if it is multi-channel and only one frame 3) (F,N) if it is single channel but multiple frames 4) (F,N,D) if it is multi-channel and multiple frames where: - F is the number of frames - N is the number of frequency bins - D is the number of channels

Returns

x_r – Reconstructed time-domain signal.

Return type

numpy array

zero_pad_back(zb)

Set zero-padding at end of frame.

zero_pad_front(zf)

Set zero-padding at beginning of frame.

pyroomacoustics.transform.stft.analysis(x, L, hop, win=None, zp_back=0, zp_front=0)

Convenience function for one-shot STFT

Parameters
  • x (array_like, (n_samples) or (n_samples, n_channels)) – input signal

  • L (int) – frame size

  • hop (int) – shift size between frames

  • win (array_like) – the window to apply (default None)

  • zp_back (int) – zero padding to apply at the end of the frame

  • zp_front (int) – zero padding to apply at the beginning of the frame

Returns

X – The STFT of x

Return type

ndarray, (n_frames, n_frequencies) or (n_frames, n_frequencies, n_channels)

pyroomacoustics.transform.stft.compute_synthesis_window(analysis_window, hop)

Computes the optimal synthesis window given an analysis window and hop (frame shift). The procedure is described in

D. Griffin and J. Lim, Signal estimation from modified short-time Fourier transform, IEEE Trans. Acoustics, Speech, and Signal Process., vol. 32, no. 2, pp. 236-243, 1984.

Parameters
  • analysis_window (array_like) – The analysis window

  • hop (int) – The frame shift

pyroomacoustics.transform.stft.synthesis(X, L, hop, win=None, zp_back=0, zp_front=0)

Convenience function for one-shot inverse STFT

Parameters
  • X (array_like (n_frames, n_frequencies) or (n_frames, n_frequencies, n_channels)) – The data

  • L (int) – frame size

  • hop (int) – shift size between frames

  • win (array_like) – the window to apply (default None)

  • zp_back (int) – zero padding to apply at the end of the frame

  • zp_front (int) – zero padding to apply at the beginning of the frame

Returns

x – The inverse STFT of X

Return type

ndarray, (n_samples) or (n_samples, n_channels)