
Summary¶
Pyroomacoustics is a software package aimed at the rapid development and testing of audio array processing algorithms. The content of the package can be divided into three main components:
- Intuitive Python object-oriented interface to quickly construct different simulation scenarios involving multiple sound sources and microphones in 2D and 3D rooms;
- Fast C++ implementation of the image source model and ray tracing for general polyhedral rooms to efficiently generate room impulse responses and simulate the propagation between sources and receivers;
- Reference implementations of popular algorithms for STFT, beamforming, direction finding, adaptive filtering, source separation, and single channel denoising.
Together, these components form a package with the potential to speed up the time to market of new algorithms by significantly reducing the implementation overhead in the performance evaluation step. Please refer to this notebook for a demonstration of the different components of this package.
Room Acoustics Simulation¶
Consider the following scenario.
Suppose, for example, you wanted to produce a radio crime drama, and it so happens that, according to the scriptwriter, the story line absolutely must culminate in a satanic mass that quickly degenerates into a violent shootout, all taking place right around the altar of the highly reverberant acoustic environment of Oxford’s Christ Church cathedral. To ensure that it sounds authentic, you asked the Dean of Christ Church for permission to record the final scene inside the cathedral, but somehow he fails to be convinced of the artistic merit of your production, and declines to give you permission. But recorded in a conventional studio, the scene sounds flat. So what do you do?
—Schnupp, Nelken, and King, Auditory Neuroscience, 2010
Faced with this difficult situation, pyroomacoustics can save the day by simulating the environment of the Christ Church cathedral!
At the core of the package is a room impulse response (RIR) generator based on the image source model that can handle
- Convex and non-convex rooms
- 2D/3D rooms
The core image source model and ray tracing modules are written in C++ for better performance.
The philosophy of the package is to abstract all necessary elements of an experiment using an object-oriented programming approach. Each of these elements is represented using a class and an experiment can be designed by combining these elements just as one would do in a real experiment.
Let’s imagine we want to simulate a delay-and-sum beamformer that uses a linear array with four microphones in a shoe box shaped room that contains only one source of sound. First, we create a room object, to which we add a microphone array object, and a sound source object. Then, the room object has methods to compute the RIR between source and receiver. The beamformer object then extends the microphone array class and has different methods to compute the weights, for example delay-and-sum weights. See the example below to get an idea of what the code looks like.
The Room class also allows one to process sound samples emitted by sources, effectively simulating the propagation of sound between sources and microphones. At the input of the microphones composing the beamformer, an STFT (short time Fourier transform) engine allows to quickly process the signals through the beamformer and evaluate the output.
Reference Implementations¶
In addition to its core image source model simulation, pyroomacoustics also contains a number of reference implementations of popular audio processing algorithms for
- Short time Fourier transform (block + online)
- beamforming
- direction of arrival (DOA) finding
- adaptive filtering (NLMS, RLS)
- blind source separation (AuxIVA, Trinicon, ILRMA, SparseAuxIVA, FastMNMF)
- single channel denoising (Spectral Subtraction, Subspace, Iterative Wiener)
We use an object-oriented approach to abstract the details of specific algorithms, making them easy to compare. Each algorithm can be tuned through optional parameters. We have tried to pre-set values for the tuning parameters so that a run with the default values will in general produce reasonable results.
Quick Install¶
Install the package with pip:
pip install pyroomacoustics
A cookiecutter is available that generates a working simulation script for a few 2D/3D scenarios:
# if necessary install cookiecutter
pip install cookiecutter
# create the simulation script
cookiecutter gh:fakufaku/cookiecutter-pyroomacoustics-sim
# run the newly created script
python <chosen_script_name>.py
Dependencies¶
The minimal dependencies are:
numpy
scipy>=0.18.0
Cython
where Cython
is only needed to benefit from the compiled accelerated simulator.
The simulator itself has a pure Python counterpart, so that this requirement could
be ignored, but is much slower.
On top of that, some functionalities of the package depend on extra packages:
samplerate # for resampling signals
matplotlib # to create graphs and plots
sounddevice # to play sound samples
mir_eval # to evaluate performance of source separation in examples
The requirements.txt
file lists all packages necessary to run all of the
scripts in the examples
folder.
This package is mainly developed under Python 3.5. We try as much as possible to keep things compatible with Python 2.7 and run tests and builds under both. However, the tests code coverage is far from 100% and it might happen that we break some things in Python 2.7 from time to time. We apologize in advance for that.
Under Linux and Mac OS, the compiled accelerators require a valid compiler to be installed, typically this is GCC. When no compiler is present, the package will still install but default to the pure Python implementation which is much slower. On Windows, we provide pre-compiled Python Wheels for Python 3.5 and 3.6.
Example¶
Here is a quick example of how to create and visual the response of a beamformer in a room.
import numpy as np
import matplotlib.pyplot as plt
import pyroomacoustics as pra
# Create a 4 by 6 metres shoe box room
room = pra.ShoeBox([4,6])
# Add a source somewhere in the room
room.add_source([2.5, 4.5])
# Create a linear array beamformer with 4 microphones
# with angle 0 degrees and inter mic distance 10 cm
R = pra.linear_2D_array([2, 1.5], 4, 0, 0.1)
room.add_microphone_array(pra.Beamformer(R, room.fs))
# Now compute the delay and sum weights for the beamformer
room.mic_array.rake_delay_and_sum_weights(room.sources[0][:1])
# plot the room and resulting beamformer
room.plot(freq=[1000, 2000, 4000, 8000], img_order=0)
plt.show()
More examples¶
A couple of detailed demos with illustrations are available.
A comprehensive set of examples covering most of the functionalities
of the package can be found in the examples
folder of the GitHub
repository.
Authors¶
- Robin Scheibler
- Ivan Dokmanić
- Sidney Barthe
- Eric Bezzam
- Hanjie Pan
How to contribute¶
If you would like to contribute, please clone the repository and send a pull request.
For more details, see our CONTRIBUTING page.
Academic publications¶
This package was developed to support academic publications. The package contains implementations for DOA algorithms and acoustic beamformers introduced in the following papers.
- H. Pan, R. Scheibler, I. Dokmanic, E. Bezzam and M. Vetterli. FRIDA: FRI-based DOA estimation for arbitrary array layout, ICASSP 2017, New Orleans, USA, 2017.
- I. Dokmanić, R. Scheibler and M. Vetterli. Raking the Cocktail Party, in IEEE Journal of Selected Topics in Signal Processing, vol. 9, num. 5, p. 825 - 836, 2015.
- R. Scheibler, I. Dokmanić and M. Vetterli. Raking Echoes in the Time Domain, ICASSP 2015, Brisbane, Australia, 2015.
If you use this package in your own research, please cite our paper describing it.
R. Scheibler, E. Bezzam, I. Dokmanić, Pyroomacoustics: A Python package for audio room simulations and array processing algorithms, Proc. IEEE ICASSP, Calgary, CA, 2018.
License¶
Copyright (c) 2014-2018 EPFL-LCAV
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Table of contents¶
Contributing¶
If you want to contribute to pyroomacoustics
and make it better,
your help is very welcome. Contributing is also a great way to learn
more about the package itself.
Ways to contribute¶
- File bug reports
- Improvements to the documentation are always more than welcome. Keeping a good clean documentation is a challenging task and any help is appreciated.
- Feature requests
- If you implemented an extra DOA/adaptive filter/beamforming algorithm: that’s awesome! We’d love to add it to the package.
- Suggestion of improvements to the code base are also welcome.
Coding style¶
We try to stick to PEP8 as much as possible. Variables, functions, modules and packages should be in lowercase with underscores. Class names in CamelCase.
Documentation¶
Docstrings should follow the numpydoc style.
We recommend the following steps for generating the documentation:
- Create a separate environment, e.g. with Anaconda, as such:
conda create -n mkdocs37 python=3.7 sphinx numpydoc mock sphinx_rtd_theme
- Switch to the environment:
source activate mkdocs37
- Navigate to the
docs
folder and run:./make_apidoc.sh
- The materials database page is generated with the script
./make_materials_table.py
- Build and view the documentation locally with:
make html
- Open in your browser:
docs/_build/html/index.html
Develop Locally¶
It can be convenient to develop and run tests locally. In contrast to only using the package, you will then also need to compile the C++ extension for that. On Mac and Linux, GCC is required, while Visual C++ 14.0 is necessary for windows.
Get the source code. Use recursive close so that Eigen (a sub-module of this repository) is also downloaded.
git clone --recursive git@github.com:LCAV/pyroomacoustics.git
Alternatively, you can clone without the –recursive flag and directly install the Eigen library. For macOS, you can find installation instruction here: https://stackoverflow.com/a/35658421. After installation you can create a symbolic link as such:
ln -s PATH_TO_EIGEN pyroomacoustics/libroom_src/ext/eigen/Eigen
Install a few pre-requisites
pip install numpy Cython pybind11
Compile locally
python setup.py build_ext --inplace
On recent Mac OS (Mojave), it is necessary in some cases to add a higher deployment target
MACOSX_DEPLOYMENT_TARGET=10.9 python setup.py build_ext --inplace
Update
$PYTHONPATH
so that python knows where to find the local package# Linux/Mac export PYTHONPATH=<path_to_pyroomacoustics>:$PYTHONPATH
For windows, see this question on stackoverflow.
Install the dependencies listed in
requirements.txt
pip install -r requirements.txt
Now fire up
python
oripython
and check that the package can be importedimport pyroomacoustics as pra
Unit Tests¶
As much as possible, for every new function added to the code base, add
a short test script in pyroomacoustics/tests
. The names of the
script and the functions running the test should be prefixed by
test_
. The tests are started by running nosetests
at the root of
the package.
How to make a clean pull request¶
Look for a project’s contribution instructions. If there are any, follow them.
- Create a personal fork of the project on Github.
- Clone the fork on your local machine. Your remote repo on Github is
called
origin
. - Add the original repository as a remote called
upstream
. - If you created your fork a while ago be sure to pull upstream changes into your local repository.
- Create a new branch to work on! Branch from
develop
if it exists, else frommaster
. - Implement/fix your feature, comment your code.
- Follow the code style of the project, including indentation.
- If the project has tests run them!
- Write or adapt tests as needed.
- Add or change the documentation as needed.
- Squash your commits into a single commit with git’s interactive rebase. Create a new branch if necessary.
- Push your branch to your fork on Github, the remote
origin
. - From your fork open a pull request in the correct branch. Target the
project’s
develop
branch if there is one, else go formaster
! - …
- If the maintainer requests further changes just push them to your branch. The PR will be updated automatically.
- Once the pull request is approved and merged you can pull the changes
from
upstream
to your local repo and delete your extra branch(es).
And last but not least: Always write your commit messages in the present tense. Your commit message should describe what the commit, when applied, does to the code – not what you did to the code.
Reference¶
This guide is based on the nice template by @MarcDiethelm available under MIT License.
Changelog¶
All notable changes to pyroomacoustics will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
Unreleased¶
Improved Simulator with Ray Tracing¶
- Ray Tracing in the libroom module. The function compute_rir() of the Room object in python can now be executed using a pure ray tracing approach or a hybrid (ISM + RT) approach. That’s why this function has now several default arguments to run ray tracing (number of rays, scattering coefficient, energy and time thresholds, microphone’s radius).
- Bandpass filterbank construction in
pyroomacoustics.acoustics.bandpass_filterbank
- Acoustic properties of different materials in
pyroomacoustics.materials
- Scattering from the wall is handled via ray tracing method, scattering coefficients are provided
in
pyroomacoustics.materials.Material
objects - Function
inverse_sabine
allows to compute theabsorption
andmax_order
to use with the image source model to achieve a given reverberation time - The method
rt60_theory
inpyroomacoustics.room.Room
allows to compute the theoretical RT60 of the room according to Eyring or Sabine formula - The method
measure_rt60
inpyroomacoustics.room.Room
allows to measure the RT60 of the simulated RIRs
Changes in the Room Class¶
- Deep refactor of Room class. The constructor arguments have changed
- No more
sigma2_awgn
, noise is now handled inpyroomacoustics.Room.simulate
method - The way absorption is handled has changed. The scalar variables
absorption
are deprecated in favor ofpyroomacoustics.materials.Material
- Complete refactor of libroom, the compiled extension module responsible for the room simulation, into C++. The bindings to python are now done using pybind11.
- Removes the pure Python room simulator as it was really slow
pyroomacoustics.transform.analysis
,pyroomacoustics.transform.synthesis
,pyroomacoustics.transform.compute_synthesis_window
, have been deprecated in favor ofpyroomacoustics.transform.stft.analysis
,pyroomacoustics.transform.stft.synthesis
,pyroomacoustics.transform.stft.compute_synthesis_window
.pyroomacoustics.Room
has a new methodadd
that can be used to add either aSoundSource
, or aMicrophoneArray
object. Subsequent calls to the method will always add source/microphones. There exists also methodsadd_source
andadd_microphone
that can be used to add source/microphone via coordinates. The methodadd_microphone_array
can be used to add aMicrophoneArray
object, or a 2D array containing the locations of several microphones in its columns. While theadd_microphone_array
method used to replace the existing array by the argument, the new behavior is to add in addition to other microphones already present.
Other Changes¶
- Added vectorised functions in MUSIC
- Use the vectorised functions in _process of MUSIC
0.3.1 - 2020-05-12¶
Bugfix¶
- From Issue #150, increase max iterations to check if point is inside room
- Issues #117 #163, adds project file pyproject.toml so that pip can know which dependencies are necessary for setup
Added¶
- Added room_isinside_max_iter in parameters.py
- Default set to 20 rather than 5 as it was in pyroomacoustics.room.Room.isinside
- Added Binder link in the README for online interactive demo
Changed¶
- Changed while loop to iterate up to room_isinside_max_iter in pyroomacoustics.room.Room.isinside
`0.3.2`_ - 2019-02-23¶
Bugfix¶
- Fixed some bugs in the documentation
- Fixed normalization part in FastMNMF
Changed¶
- Changed initialization of FastMNMF to accelerate convergence
- Fixed bug in doa/tops (float -> integer division)
0.3.1 - 2019-11-06¶
Bugfix¶
- Fixed a non-unicode character in
pyroomacoustics.experimental.rt60
breaking the tests
0.3.0 - 2019-11-06¶
Added¶
- The routine
pyroomacoustics.experimental.measure_rt60
to automatically measure the reverberation time of impulse responses. This is useful for measured and simulated responses.
Bugfix¶
- Fixed docstring and an argument of pyroomacoustics.bss.ilrma
0.2.0 - 2019-09-04¶
Added¶
- Added FastMNMF (Fast Multichannel Nonnegative Matrix Factorization) to
bss
subpackage. - Griffin-Lim algorithm for phase reconstruction from STFT magnitude measurements.
Changed¶
- Removed the supperfluous warnings in pyroomacoustics.transform.stft.
- Add option in pyroomacoustics.room.Room.plot_rir to set pair of channels to plot; useful when there’s too many impulse responses.
- Add some window functions in windows.py and rearranged it in alphabetical order
- Fixed various warnings in tests.
- Faster implementation of AuxIVA that also includes OverIVA (more mics than sources). It also comes with a slightly changed API, Laplace and time-varying Gauss statistical models, and two possible initialization schemes.
- Faster implementation of ILRMA.
- SparseAuxIVA has slightly changed API,
f_contrast
has been replaced bymodel
keyword argument.
Bugfix¶
- Set
rcond=None
in all calls tonumpy.linalg.lstsq
to remove aFutureWarning
- Add a lower bound to activations in
pyroomacoustics.bss.auxiva
to avoid underflow and divide by zero. - Fixed a memory leak in the C engine for polyhedral room (issue #116).
- Fixed problem caused by dependency of setup.py on Cython (Issue #117)
0.1.23 - 2019-04-17¶
Bugfix¶
- Expose
mu
parameter foradaptive.subband_lms.SubbandLMS
. - Add SSL context to
download_uncompress
and unit test; error for Python 2.7.
0.1.22 - 2019-04-11¶
Added¶
- Added “precision” parameter to “stft” class to choose between ‘single’ (float32/complex64) or ‘double’ (float64/complex128) for processing precision.
- Unified unit test file for frequency-domain souce separation methods.
- New algorithm for blind source separation (BSS): Sparse Independent Vector Analysis (SparseAuxIVA).
Changed¶
- Few README improvements
Bugfix¶
- Remove
np.squeeze
in STFT as it caused errors when an axis that shouldn’t be squeezed was equal to 1. Beamformer.process
was using old (non-existent) STFT function. Changed to using one-shot function fromtransform
module.- Fixed a bug in
utilities.fractional_delay_filter_bank
that would cause the function to crash on some inputs (issue #87).
0.1.21 - 2018-12-20¶
Added¶
- Adds several options to
pyroomacoustics.room.Room.simulate
to finely control the SNR of the microphone signals and also return the microphone signals with individual sources, prior to mix (useful for BSS evaluation) - Add subspace denoising approach in
pyroomacoustics.denoise.subspace
. - Add iterative Wiener filtering approach for single channel denoising in
pyroomacoustics.denoise.iterative_wiener
.
Changed¶
- Add build instructions for python 3.7 and wheels for Mac OS X in the continuous integration (Travis and Appveyor)
- Limits imports of matplotlib to within plotting functions so that the matplotlib backend can still be changed, even after importing pyroomacoustics
- Better Vectorization of the computations in
pyroomacoustics.bss.auxiva
Bugfix¶
- Corrects a bug that causes different behavior whether sources are provided to the constructor of
Room
or to theadd_source
method - Corrects a typo in
pyroomacoustics.SoundSource.add_signal
- Corrects a bug in the update of the demixing matrix in
pyroomacoustics.bss.auxiva
- Corrects invalid memory access in the
pyroomacoustics.build_rir
cython accelerator and adds a unit test that checks the cython code output is correct - Fix bad handling of 1D b vectors in
`pyroomacoustics.levinson
.
0.1.20 - 2018-10-04¶
Added¶
- STFT tutorial and demo notebook.
- New algorithm for blind source separation (BSS): Independent Low-Rank Matrix Analysis (ILRMA)
Changed¶
- Matplotlib is not a hard requirement anymore. When matplotlib is not installed, only a warning is issued on plotting commands. This is useful to run pyroomacoustics on headless servers that might not have matplotlib installed
- Removed dependencies on
joblib
andrequests
packages - Apply
matplotlib.pyplot.tight_layout
inpyroomacoustics.Room.plot_rir
Bugfix¶
- Monaural signals are now properly handled in one-shot stft/istft
- Corrected check of size of absorption coefficients list in
Room.from_corners
0.1.19 - 2018-09-24¶
Added¶
- Added noise reduction sub-package
denoise
with spectral subtraction class and example. - Renamed
realtime
totransform
and added deprecation warning. - Added a cython function to efficiently compute the fractional delays in the room impulse response from time delays and attenuations
- notebooks folder.
- Demo IPython notebook (with WAV files) of several features of the package.
- Wrapper for Google’s Speech Command Dataset and an example usage script in
examples
. - Lots of new features in the
pyroomacoustics.realtime
subpackage- The
STFT
class can now be used both for frame-by-frame processing or for bulk processing - The functionality will replace the methods
pyroomacoustics.stft
,pyroomacoustics.istft
,pyroomacoustics.overlap_add
, etc, - The new function
pyroomacoustics.realtime.compute_synthesis_window
computes the optimal synthesis window given an analysis window and the frame shift - Extensive tests for the
pyroomacoustics.realtime
module - Convenience functions
pyroomacoustics.realtime.analysis
andpyroomacoustics.realtime.synthesis
with an interface similar topyroomacoustics.stft
andpyroomacoustics.istft
(which are now deprecated and will disappear soon) - The ordering of axis in the output from bulk STFT is now
(n_frames, n_frequencies, n_channels)
- Support for Intel’s
mkl_fft
package axis
(along which to perform DFT) andbits
parameters forDFT
class.
- The
Changed¶
- Improved documentation and docstrings
- Using now the built-in RIR generator in examples/doa_algorithms.py
- Improved the download/uncompress function for large datasets
- Dusted the code for plotting on the sphere in
pyroomacoustics.doa.grid.GridSphere
Deprecation Notice¶
- The methods
pyroomacoustics.stft
,pyroomacoustics.istft
,pyroomacoustics.overlap_add
, etc, are now deprecated and will be removed in the near future
0.1.18 - 2018-04-24¶
Added¶
- Added AuxIVA (independent vector analysis) to
bss
subpackage. - Added BSS IVA example
Changed¶
- Moved Trinicon blind source separation algorithm to
bss
subpackage.
Bugfix¶
- Correct a bug that causes 1st order sources to be generated for max_order==0 in pure python code
0.1.17 - 2018-03-23¶
Bugfix¶
- Fixed issue #22 on github. Added INCREF before returning Py_None in C extension.
0.1.16 - 2018-03-06¶
Added¶
- Base classes for Dataset and Sample in
pyroomacoustics.datasets
- Methods to filter datasets according to the metadata of samples
- Deprecation warning for the TimitCorpus interface
Changed¶
- Add list of speakers and sentences from CMU ARCTIC
- CMUArcticDatabase basedir is now the top directory where CMU_ARCTIC database should be saved. Not the directory above as it previously was.
- Libroom C extension is now a proper package. It can be imported.
- Libroom C extension now compiles on windows with python>=3.5.
Room Simulation¶
Room¶
The three main classes are pyroomacoustics.room.Room
,
pyroomacoustics.soundsource.SoundSource
, and
pyroomacoustics.beamforming.MicrophoneArray
. On a high level, a
simulation scenario is created by first defining a room to which a few sound
sources and a microphone array are attached. The actual audio is attached to
the source as raw audio samples.
Then, a simulation method is used to create artificial room impulse responses (RIR) between the sources and microphones. The current default method is the image source which considers the walls as perfect reflectors. An experimental hybrid simulator based on image source method (ISM) [1] and ray tracing (RT) [2], [3], is also available. Ray tracing better capture the later reflections and can also model effects such as scattering.
The microphone signals are then created by convolving audio samples associated to sources with the appropriate RIR. Since the simulation is done on discrete-time signals, a sampling frequency is specified for the room and the sources it contains. Microphones can optionally operate at a different sampling frequency; a rate conversion is done in this case.
Simulating a Shoebox Room with the Image Source Model¶
We will first walk through the steps to simulate a shoebox-shaped room in 3D. We use the ISM is to find all image sources up to a maximum specified order and room impulse responses (RIR) are generated from their positions.
The code for the full example can be found in examples/room_from_rt60.py.
Create the room¶
So-called shoebox rooms are pallelepipedic rooms with 4 or 6 walls (in 2D and
3D respectiely), all at right angles. They are defined by a single vector that
contains the lengths of the walls. They have the advantage of being simple to
define and very efficient to simulate. In the following example, we define a
9m x 7.5m x 3.5m
room. In addition, we use Sabine’s formula
to find the wall energy absorption and maximum order of the ISM required
to achieve a desired reverberation time (RT60, i.e. the time it takes for
the RIR to decays by 60 dB).
import pyroomacoustics as pra
# The desired reverberation time and dimensions of the room
rt60 = 0.5 # seconds
room_dim = [9, 7.5, 3.5] # meters
# We invert Sabine's formula to obtain the parameters for the ISM simulator
e_absorption, max_order = pra.inverse_sabine(rt60, room_dim)
# Create the room
room = pra.ShoeBox(
room_dim, fs=16000, materials=pra.Material(e_absorption), max_order=max_order
)
The second argument is the sampling frequency at which the RIR will be
generated. Note that the default value of fs
is 8 kHz.
The third argument is the material of the wall, that itself takes the absorption as a parameter.
The fourth and last argument is the maximum number of reflections allowed in the ISM.
Note
Note that Sabine’s formula is only an approximation and that the actually simulated RT60 may vary by quite a bit.
Warning
Until recently, rooms would take an absorption
parameter that was
actually not the energy absorption we use now. The absorption
parameter is now deprecated and will be removed in the future.
Add sources and microphones¶
Sources are fairly straighforward to create. They take their location as single
mandatory argument, and a signal and start time as optional arguments. Here we
create a source located at [2.5, 3.73, 1.76]
within the room, that will utter
the content of the wav file speech.wav
starting at 1.3 s
into the
simulation. The signal
keyword argument to the
add_source()
method should be a
one-dimensional numpy.ndarray
containing the desired sound signal.
# import a mono wavfile as the source signal
# the sampling frequency should match that of the room
from scipy.io import wavfile
_, audio = wavfile.read('speech.wav')
# place the source in the room
room.add_source([2.5, 3.73, 1.76], signal=audio, delay=1.3)
The locations of the microphones in the array should be provided in a numpy
nd-array
of size (ndim, nmics)
, that is each column contains the
coordinates of one microphone. Note that it can be different from that
of the room, in which case resampling will occur. Here, we create an array
with two microphones placed at [6.3, 4.87, 1.2]
and [6.3, 4.93, 1.2]
.
# define the locations of the microphones
import numpy as np
mic_locs = np.c_[
[6.3, 4.87, 1.2], # mic 1
[6.3, 4.93, 1.2], # mic 2
]
# finally place the array in the room
room.add_microphone_array(mic_locs)
A number of routines exist to create regular array geometries in 2D.
Create the Room Impulse Response¶
At this point, the RIRs are simply created by invoking the ISM via
image_source_model()
. This function will
generate all the images sources up to the order required and use them to
generate the RIRs, which will be stored in the rir
attribute of room
.
The attribute rir
is a list of lists so that the outer list is on microphones
and the inner list over sources.
room.compute_rir()
# plot the RIR between mic 1 and source 0
import matplotlib.pyplot as plt
plt.plot(room.rir[1][0])
plt.show()
Warning
The simulator uses a fractional delay filter that introduce a global delay in the RIR. The delay can be obtained as follows.
Simulate sound propagation¶
By calling simulate()
, a convolution of the
signal of each source (if not None
) will be performed with the
corresponding room impulse response. The output from the convolutions will be summed up
at the microphones. The result is stored in the signals
attribute of room.mic_array
with each row corresponding to one microphone.
room.simulate()
# plot signal at microphone 1
plt.plot(room.mic_array.signals[1,:])
Full Example¶
This example is partly exctracted from ./examples/room_from_rt60.py.
import numpy as np
import matplotlib.pyplot as plt
import pyroomacoustics as pra
from scipy.io import wavfile
# The desired reverberation time and dimensions of the room
rt60_tgt = 0.3 # seconds
room_dim = [10, 7.5, 3.5] # meters
# import a mono wavfile as the source signal
# the sampling frequency should match that of the room
fs, audio = wavfile.read("examples/samples/guitar_16k.wav")
# We invert Sabine's formula to obtain the parameters for the ISM simulator
e_absorption, max_order = pra.inverse_sabine(rt60_tgt, room_dim)
# Create the room
room = pra.ShoeBox(
room_dim, fs=fs, materials=pra.Material(e_absorption), max_order=max_order
)
# place the source in the room
room.add_source([2.5, 3.73, 1.76], signal=audio, delay=0.5)
# define the locations of the microphones
mic_locs = np.c_[
[6.3, 4.87, 1.2], [6.3, 4.93, 1.2], # mic 1 # mic 2
]
# finally place the array in the room
room.add_microphone_array(mic_locs)
# Run the simulation (this will also build the RIR automatically)
room.simulate()
room.mic_array.to_wav(
f"examples/samples/guitar_16k_reverb_{args.method}.wav",
norm=True,
bitdepth=np.int16,
)
# measure the reverberation time
rt60 = room.measure_rt60()
print("The desired RT60 was {}".format(rt60_tgt))
print("The measured RT60 is {}".format(rt60[1, 0]))
# Create a plot
plt.figure()
# plot one of the RIR. both can also be plotted using room.plot_rir()
rir_1_0 = room.rir[1][0]
plt.subplot(2, 1, 1)
plt.plot(np.arange(len(rir_1_0)) / room.fs, rir_1_0)
plt.title("The RIR from source 0 to mic 1")
plt.xlabel("Time [s]")
# plot signal at microphone 1
plt.subplot(2, 1, 2)
plt.plot(room.mic_array.signals[1, :])
plt.title("Microphone 1 signal")
plt.xlabel("Time [s]")
plt.tight_layout()
plt.show()
Hybrid ISM/Ray Tracing Simulator¶
Warning
The hybrid simulator has not been thoroughly tested yet and should be used with
care. The exact implementation and default settings may also change in the future.
Currently, the default behavior of Room
and ShoeBox
has been kept as in previous
versions of the package. Bugs and user experience can be reported on
github.
The hybrid ISM/RT simulator uses ISM to simulate the early reflections in the RIR and RT for the diffuse tail. Our implementation is based on [2] and [3].
The simulator has the following features.
- Scattering: Wall scattering can be defined by assigning a scattering coefficient to the walls together with the energy absorption.
- Multi-band: The simulation can be carried out with different parameters for different octave bands. The octave bands go from 125 Hz to half the sampling frequency.
- Air absorption: The frequency dependent absorption of the air can be turned
by providing the keyword argument
air_absorption=True
to the room constructor.
Here is a simple example using the hybrid simulator.
We suggest to use max_order=3
with the hybrid simulator.
# Create the room
room = pra.ShoeBox(
room_dim,
fs=16000,
materials=pra.Material(e_absorption),
max_order=3,
ray_tracing=True,
air_absorption=True,
)
# Activate the ray tracing
room.set_ray_tracing()
A few example programs are provided in ./examples
.
./examples/ray_tracing.py
demonstrates use of ray tracing for rooms of different sizes and with different amounts of reverberation./examples/room_L_shape_3d_rt.py
shows how to simulate a polyhedral room./examples/room_from_stl.py
demonstrates how to import a model from an STL file
Wall Materials¶
The wall materials are handled by the
Material
objects. A material is defined
by at least one absorption coefficient that represents the ratio of sound
energy absorbed by a wall upon reflection.
A material may have multiple absorption coefficients corresponding to different
abosrptions at different octave bands.
When only one coefficient is provided, the absorption is assumed to be uniform at
all frequencies.
In addition, materials may have one or more scattering coefficients
corresponding to the ratio of energy scattered upon reflection.
The materials can be defined by providing the coefficients directly, or they can be defined by chosing a material from the materials database [2].
import pyroomacoustics as pra
m = pra.Material(energy_absorption="hard_surface")
room = pra.ShoeBox([9, 7.5, 3.5], fs=16000, materials=m, max_order=17)
We can use different materials for different walls. In this case, the materials should be provided in a dictionary. For a shoebox room, this can be done as follows.
import pyroomacoustics as pra
m = pra.make_materials(
ceiling="hard_surface",
floor="6mm_carpet",
east="brickwork",
west="brickwork",
north="brickwork",
south="brickwork",
)
room = pra.ShoeBox(
[9, 7.5, 3.5], fs=16000, materials=m, max_order=17, air_absorption=True, ray_tracing=True
)
Note
For shoebox rooms, the walls are labelled as follows:
west
/east
for the walls in the y-z plane with a small/large x coordinates, respectivelysouth
/north
for the walls in the x-z plane with a small/large y coordinates, respectivelyfloor
/ceiling
for the walls int x-y plane with small/large z coordinates, respectively
Controlling the signal-to-noise ratio¶
It is in general necessary to scale the signals from different sources to
obtain a specific signal-to-noise or signal-to-interference ratio (SNR and SIR,
respectively). This can be done by passing some options to the simulate()
function. Because the relative amplitude of signals will change at different microphones
due to propagation, it is necessary to choose a reference microphone. By default, this
will be the first microphone in the array (index 0). The simplest choice is to choose
the variance of the noise \(\sigma_n^2\) to achieve a desired SNR with respect
to the cumulative signal from all sources. Assuming that the signals from all sources
are scaled to have the same amplitude (e.g., unit amplitude) at the reference microphone,
the SNR is defined as
where \(K\) is the number of sources. For example, an SNR of 10 decibels (dB) can be obtained using the following code
room.simulate(reference_mic=0, snr=10)
Sometimes, more challenging normalizations are necessary. In that case,
a custom callback function can be provided to simulate. For example,
we can imagine a scenario where we have n_src
out of which n_tgt
are the targets, the rest being interferers. We will assume all
targets have unit variance, and all interferers have equal
variance \(\sigma_i^2\) (at the reference microphone). In
addition, there is uncorrelated noise \(\sigma_n^2\) at
every microphones. We will define SNR and SIR with respect
to a single target source:
The callback function callback_mix
takes as argument an nd-array
premix_signals
of shape (n_src, n_mics, n_samples)
that contains the
microphone signals prior to mixing. The signal propagated from the k
-th
source to the m
-th microphone is contained in premix_signals[k,m,:]
. It
is possible to provide optional arguments to the callback via
callback_mix_kwargs
optional argument. Here is the code
implementing the example described.
# the extra arguments are given in a dictionary
callback_mix_kwargs = {
'snr' : 30, # SNR target is 30 decibels
'sir' : 10, # SIR target is 10 decibels
'n_src' : 6,
'n_tgt' : 2,
'ref_mic' : 0,
}
def callback_mix(premix, snr=0, sir=0, ref_mic=0, n_src=None, n_tgt=None):
# first normalize all separate recording to have unit power at microphone one
p_mic_ref = np.std(premix[:,ref_mic,:], axis=1)
premix /= p_mic_ref[:,None,None]
# now compute the power of interference signal needed to achieve desired SIR
sigma_i = np.sqrt(10 ** (- sir / 10) / (n_src - n_tgt))
premix[n_tgt:n_src,:,:] *= sigma_i
# compute noise variance
sigma_n = np.sqrt(10 ** (- snr / 10))
# Mix down the recorded signals
mix = np.sum(premix[:n_src,:], axis=0) + sigma_n * np.random.randn(*premix.shape[1:])
return mix
# Run the simulation
room.simulate(
callback_mix=callback_mix,
callback_mix_kwargs=callback_mix_kwargs,
)
mics_signals = room.mic_array.signals
In addition, it is desirable in some cases to obtain the microphone signals
with individual sources, prior to mixing. For example, this is useful to
evaluate the output from blind source separation algorithms. In this case, the
return_premix
argument should be set to True
premix = room.simulate(return_premix=True)
Reverberation Time¶
The reverberation time (RT60) is defined as the time needed for the enery of
the RIR to decrease by 60 dB. It is a useful measure of the amount of
reverberation. We provide a method in the
measure_rt60()
to measure the RT60
of recorded or simulated RIR.
The method is also directly integrated in the Room
object as the method measure_rt60()
.
# assuming the simulation has already been carried out
rt60 = room.measure_rt60()
for m in room.n_mics:
for s in room.n_sources:
print(
"RT60 between the {}th mic and {}th source: {:.3f} s".format(m, s, rt60[m, s])
)
References
[1] |
|
[2] | (1, 2, 3)
|
[3] | (1, 2)
|
-
class
pyroomacoustics.room.
Room
(walls, fs=8000, t0=0.0, max_order=1, sigma2_awgn=None, sources=None, mics=None, temperature=None, humidity=None, air_absorption=False, ray_tracing=False)¶ Bases:
object
A Room object has as attributes a collection of
pyroomacoustics.wall.Wall
objects, apyroomacoustics.beamforming.MicrophoneArray
array, and a list ofpyroomacoustics.soundsource.SoundSource
. The room can be two dimensional (2D), in which case the walls are simply line segments. A factory methodpyroomacoustics.room.Room.from_corners()
can be used to create the room from a polygon. In three dimensions (3D), the walls are two dimensional polygons, namely a collection of points lying on a common plane. Creating rooms in 3D is more tedious and for convenience a methodpyroomacoustics.room.Room.extrude()
is provided to lift a 2D room into 3D space by adding vertical walls and parallel floor and ceiling.The Room is sub-classed by :py:obj:pyroomacoustics.room.ShoeBox` which creates a rectangular (2D) or parallelepipedic (3D) room. Such rooms benefit from an efficient algorithm for the image source method.
Attribute walls: (Wall array) list of walls forming the room
Attribute fs: (int) sampling frequency
Attribute max_order: (int) the maximum computed order for images
Attribute sources: (SoundSource array) list of sound sources
Attribute mics: (MicrophoneArray) array of microphones
Attribute corners: (numpy.ndarray 2xN or 3xN, N=number of walls) array containing a point belonging to each wall, used for calculations
Attribute absorption: (numpy.ndarray size N, N=number of walls) array containing the absorption factor for each wall, used for calculations
Attribute dim: (int) dimension of the room (2 or 3 meaning 2D or 3D)
Attribute wallsId: (int dictionary) stores the mapping “wall name -> wall id (in the array walls)”
Parameters: - walls (list of Wall or Wall2D objects) – The walls forming the room.
- fs (int, optional) – The sampling frequency in Hz. Default is 8000.
- t0 (float, optional) – The global starting time of the simulation in seconds. Default is 0.
- max_order (int, optional) – The maximum reflection order in the image source model. Default is 1, namely direct sound and first order reflections.
- sigma2_awgn (float, optional) – The variance of the additive white Gaussian noise added during simulation. By default, none is added.
- sources (list of SoundSource objects, optional) – Sources to place in the room. Sources can be added after room creating with the add_source method by providing coordinates.
- mics (MicrophoneArray object, optional) – The microphone array to place in the room. A single microphone or microphone array can be added after room creation with the add_microphone_array method.
- temperature (float, optional) – The air temperature in the room in degree Celsius. By default, set so that speed of sound is 343 m/s.
- humidity (float, optional) – The relative humidity of the air in the room (between 0 and 100). By default set to 0.
- air_absorption (bool, optional) – If set to True, absorption of sound energy by the air will be simulated.
- ray_tracing (bool, optional) – If set to True, the ray tracing simulator will be used along with image source model.
-
add
(obj)¶ Adds a sound source or microphone to a room
Parameters: obj ( SoundSource
orMicrophone
object) – The object to addReturns: The room is returned for further tweaking. Return type: Room
-
add_microphone
(loc, fs=None)¶ Adds a single microphone in the room.
Parameters: - loc (array_like or ndarray) – The location of the microphone. The length should be the same as the room dimension.
- fs (float, optional) – The sampling frequency of the microphone, if different from that of the room.
Returns: The room is returned for further tweaking.
Return type:
-
add_microphone_array
(mic_array)¶ Adds a microphone array (i.e. several microphones) in the room.
Parameters: mic_array (array_like or ndarray or MicrophoneArray object) – The array can be provided as an array of size
(dim, n_mics)
, wheredim
is the dimension of the room andn_mics
is the number of microphones in the array.As an alternative, a
MicrophoneArray
can be provided.Returns: The room is returned for further tweaking. Return type: Room
-
add_soundsource
(sndsrc)¶ Adds a
pyroomacoustics.soundsource.SoundSource
object to the room.Parameters: sndsrc ( SoundSource
object) – The SoundSource object to add to the room
-
add_source
(position, signal=None, delay=0)¶ Adds a sound source given by its position in the room. Optionally a source signal and a delay can be provided.
Parameters: - position (ndarray, shape: (2,) or (3,)) – The location of the source in the room
- signal (ndarray, shape: (n_samples,), optional) – The signal played by the source
- delay (float, optional) – A time delay until the source signal starts in the simulation
Returns: The room is returned for further tweaking.
Return type:
-
compute_rir
()¶ Compute the room impulse response between every source and microphone.
-
direct_snr
(x, source=0)¶ Computes the direct Signal-to-Noise Ratio
-
extrude
(height, v_vec=None, absorption=None, materials=None)¶ Creates a 3D room by extruding a 2D polygon. The polygon is typically the floor of the room and will have z-coordinate zero. The ceiling
Parameters: - height (float) – The extrusion height
- v_vec (array-like 1D length 3, optionnal) – A unit vector. An orientation for the extrusion direction. The ceiling will be placed as a translation of the floor with respect to this vector (The default is [0,0,1]).
- absorption (float or array-like) – Absorption coefficients for all the walls. If a scalar, then all the walls will have the same absorption. If an array is given, it should have as many elements as there will be walls, that is the number of vertices of the polygon plus two. The two last elements are for the floor and the ceiling, respectively. (default 1)
-
classmethod
from_corners
(corners, absorption=None, fs=8000, t0=0.0, max_order=1, sigma2_awgn=None, sources=None, mics=None, materials=None, **kwargs)¶ Creates a 2D room by giving an array of corners.
Parameters: - corners ((np.array dim 2xN, N>2)) – list of corners, must be antiClockwise oriented
- absorption (float array or float) – list of absorption factor for each wall or single value for all walls
Returns: Return type: Instance of a 2D room
-
get_bbox
()¶ Returns a bounding box for the room
-
get_volume
()¶ Computes the volume of a room :param room: the room object :return: the volume in cubic unit
-
get_wall_by_name
(name)¶ Returns the instance of the wall by giving its name.
Parameters: name – (string) name of the wall Returns: (Wall) instance of the wall with this name
-
image_source_model
()¶
-
is_inside
(p, include_borders=True)¶ Checks if the given point is inside the room.
Parameters: - p (array_like, length 2 or 3) – point to be tested
- include_borders (bool, optional) – set true if a point on the wall must be considered inside the room
Returns: Return type: True if the given point is inside the room, False otherwise.
-
is_multi_band
¶
-
measure_rt60
(decay_db=60, plot=False)¶ Measures the reverberation time (RT60) of the simulated RIR.
Parameters: - decay_db (float) – This is the actual decay of the RIR used for the computation. The default is 60, meaning that the RT60 is exactly what we measure. In some cases, the signal may be too short to measure 60 dB decay. In this case, we can specify a lower value. For example, with 30 dB, the RT60 is twice the time measured.
- plot (bool) – Displays a graph of the Schroeder curve and the estimated RT60.
Returns: An array that contains the measured RT60 for all the RIR.
Return type: ndarray (n_mics, n_sources)
-
n_mics
¶
-
n_sources
¶
-
plot
(img_order=None, freq=None, figsize=None, no_axis=False, mic_marker_size=10, **kwargs)¶ Plots the room with its walls, microphones, sources and images
-
plot_rir
(select=None, FD=False)¶ Plot room impulse responses. Compute if not done already.
Parameters: - select (list of tuples OR int) – List of RIR pairs (mic, src) to plot, e.g. [(0,0), (0,1)]. Or int to plot RIR from particular microphone to all sources. Note that microphones and sources are zero-indexed. Default is to plot all microphone-source pairs.
- FD (bool) – Whether to plot in the frequency domain, namely the transfer function. Default is False.
-
ray_tracing
()¶
-
rt60_theory
(formula='sabine')¶ Compute the theoretical reverberation time (RT60) for the room.
Parameters: formula (str) – The formula to use for the calculation, ‘sabine’ (default) or ‘eyring’
-
set_air_absorption
(coefficients=None)¶ Activates or deactivates air absorption in the simulation.
Parameters: coefficients (list of float) – List of air absorption coefficients, one per octave band
-
set_ray_tracing
(n_rays=None, receiver_radius=0.5, energy_thres=1e-07, time_thres=10.0, hist_bin_size=0.004)¶ Activates the ray tracer.
Parameters: - n_rays (int, optional) – The number of rays to shoot in the simulation
- receiver_radius (float, optional) – The radius of the sphere around the microphone in which to integrate the energy (default: 0.5 m)
- energy_thres (float, optional) – The energy thresold at which rays are stopped (default: 1e-7)
- time_thres (float, optional) – The maximum time of flight of rays (default: 10 s)
- hist_bin_size (float) – The time granularity of bins in the energy histogram (default: 4 ms)
-
set_sound_speed
(c)¶ Sets the speed of sound unconditionnaly
-
simulate
(snr=None, reference_mic=0, callback_mix=None, callback_mix_kwargs={}, return_premix=False, recompute_rir=False)¶ Simulates the microphone signal at every microphone in the array
Parameters: - reference_mic (int, optional) – The index of the reference microphone to use for SNR computations. The default reference microphone is the first one (index 0)
- snr (float, optional) –
The target signal-to-noise ratio (SNR) in decibels at the reference microphone. When this option is used the argument
pyroomacoustics.room.Room.sigma2_awgn
is ignored. The variance of every source at the reference microphone is normalized to one and the variance of the noise \(\sigma_n^2\) is chosen\[\mathsf{SNR} = 10 \log_{10} \frac{ K }{ \sigma_n^2 }\]The value of
pyroomacoustics.room.Room.sigma2_awgn
is also set to \(\sigma_n^2\) automatically - callback_mix (func, optional) – A function that will perform the mix, it takes as first argument
an array of shape
(n_sources, n_mics, n_samples)
that contains the source signals convolved with the room impulse response prior to mixture at the microphone. It should return an array of shape(n_mics, n_samples)
containing the mixed microphone signals. If such a function is provided, thesnr
option is ignored andpyroomacoustics.room.Room.sigma2_awgn
is set toNone
. - callback_mix_kwargs (dict, optional) – A dictionary that contains optional arguments for
callback_mix
function - return_premix (bool, optional) – If set to
True
, the function will return an array of shape(n_sources, n_mics, n_samples)
containing the microphone signals with individual sources, convolved with the room impulse response but prior to mixing - recompute_rir (bool, optional) – If set to
True
, the room impulse responses will be recomputed prior to simulation
Returns: Depends on the value of
return_premix
optionReturn type: Nothing or an array of shape
(n_sources, n_mics, n_samples)
-
unset_air_absorption
()¶ Deactivates the ray tracer
-
unset_ray_tracing
()¶ Deactivates the ray tracer
-
volume
¶
-
wall_area
(wall)¶ Computes the area of a 3D planar wall. :param wall: the wall object that is defined in the 3D space
-
class
pyroomacoustics.room.
ShoeBox
(p, fs=8000, t0=0.0, absorption=None, max_order=1, sigma2_awgn=None, sources=None, mics=None, materials=None, temperature=None, humidity=None, air_absorption=False, ray_tracing=False)¶ Bases:
pyroomacoustics.room.Room
This class provides an API for creating a ShoeBox room in 2D or 3D.
Parameters: - p (array) – Length 2 (width, length) or 3 (width, lenght, height) depending on the desired dimension of the room.
- fs (int, optional) – The sampling frequency in Hz. Default is 8000.
- t0 (float, optional) – The global starting time of the simulation in seconds. Default is 0.
- absorption (float) – Average amplitude absorption of walls. Note that this parameter is deprecated; use materials instead!
- max_order (int, optional) – The maximum reflection order in the image source model. Default is 1, namely direct sound and first order reflections.
- sigma2_awgn (float, optional) – The variance of the additive white Gaussian noise added during simulation. By default, none is added.
- sources (list of SoundSource objects, optional) – Sources to place in the room. Sources can be added after room creating with the add_source method by providing coordinates.
- mics (MicrophoneArray object, optional) – The microphone array to place in the room. A single microphone or microphone array can be added after room creation with the add_microphone_array method.
- materials (Material object or dict of Material objects) – See pyroomacoustics.parameters.Material. If providing a dict, you must provide a Material object for each wall: ‘east’, ‘west’, ‘north’, ‘south’, ‘ceiling’ (3D), ‘floor’ (3D).
- temperature (float, optional) – The air temperature in the room in degree Celsius. By default, set so that speed of sound is 343 m/s.
- humidity (float, optional) – The relative humidity of the air in the room (between 0 and 100). By default set to 0.
- air_absorption (bool, optional) – If set to True, absorption of sound energy by the air will be simulated.
- ray_tracing (bool, optional) – If set to True, the ray tracing simulator will be used along with image source model.
-
extrude
(height)¶ Overload the extrude method from 3D rooms
-
get_volume
()¶ Computes the volume of a room :param room: the room object :return: the volume in cubic unit
-
pyroomacoustics.room.
find_non_convex_walls
(walls)¶ Finds the walls that are not in the convex hull
Parameters: walls (list of Wall objects) – The walls that compose the room Returns: The indices of the walls no in the convex hull Return type: list of int
-
pyroomacoustics.room.
sequence_generation
(volume, duration, c, fs, max_rate=10000)¶
-
pyroomacoustics.room.
wall_factory
(corners, absorption, scattering, name='')¶ Call the correct method according to wall dimension
Materials Database¶
Absorption Coefficients¶
Massive constructions and hard surfaces¶
keyword | description | 125 Hz | 250 Hz | 500 Hz | 1 kHz | 2 kHz | 4 kHz | 8 kHz |
---|---|---|---|---|---|---|---|---|
hard_surface | Walls, hard surfaces average (brick walls, plaster, hard floors, etc.) | 0.02 | 0.02 | 0.03 | 0.03 | 0.04 | 0.05 | 0.05 |
brickwork | Walls, rendered brickwork | 0.01 | 0.02 | 0.02 | 0.03 | 0.03 | 0.04 | 0.04 |
rough_concrete | Rough concrete | 0.02 | 0.03 | 0.03 | 0.03 | 0.04 | 0.07 | 0.07 |
unpainted_concrete | Smooth unpainted concrete | 0.01 | 0.01 | 0.02 | 0.02 | 0.02 | 0.05 | 0.05 |
rough_lime_wash | Rough lime wash | 0.02 | 0.03 | 0.04 | 0.05 | 0.04 | 0.03 | 0.02 |
smooth_brickwork_flush_pointing | Smooth brickwork with flush pointing, painted | 0.01 | 0.01 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 |
smooth_brickwork_10mm_pointing | Smooth brickwork, 10 mm deep pointing, pit sand mortar | 0.08 | 0.09 | 0.12 | 0.16 | 0.22 | 0.24 | 0.24 |
brick_wall_rough | Brick wall, stuccoed with a rough finish | 0.03 | 0.03 | 0.03 | 0.04 | 0.05 | 0.07 | 0.07 |
ceramic_tiles | Ceramic tiles with a smooth surface | 0.01 | 0.01 | 0.01 | 0.02 | 0.02 | 0.02 | 0.02 |
limestone_wall | Limestone walls | 0.02 | 0.02 | 0.03 | 0.04 | 0.05 | 0.05 | 0.05 |
reverb_chamber | Reverberation chamber walls | 0.01 | 0.01 | 0.01 | 0.02 | 0.02 | 0.04 | 0.04 |
concrete_floor | Concrete floor | 0.01 | 0.03 | 0.05 | 0.02 | 0.02 | 0.02 | 0.02 |
marble_floor | Marble floor | 0.01 | 0.01 | 0.01 | 0.02 | 0.02 | 0.02 | 0.02 |
Lightweight constructions and linings¶
keyword | description | 125 Hz | 250 Hz | 500 Hz | 1 kHz | 2 kHz | 4 kHz | 8 kHz |
---|---|---|---|---|---|---|---|---|
plasterboard | 2 * 13 mm plasterboard on steel frame, 50 mm mineral wool in cavity, surface painted | 0.15 | 0.1 | 0.06 | 0.04 | 0.04 | 0.05 | 0.05 |
wooden_lining | Wooden lining, 12 mm fixed on frame | 0.27 | 0.23 | 0.22 | 0.15 | 0.1 | 0.07 | 0.06 |
Glazing¶
keyword | description | 125 Hz | 250 Hz | 500 Hz | 1 kHz | 2 kHz | 4 kHz | 8 kHz |
---|---|---|---|---|---|---|---|---|
glass_3mm | Single pane of glass, 3 mm | 0.08 | 0.04 | 0.03 | 0.03 | 0.02 | 0.02 | 0.02 |
glass_window | Glass window, 0.68 kg/m2 | 0.1 | 0.05 | 0.04 | 0.03 | 0.03 | 0.03 | 0.03 |
lead_glazing | Lead glazing | 0.3 | 0.2 | 0.14 | 0.1 | 0.05 | 0.05 | |
double_glazing_30mm | Double glazing, 2–3 mm glass, > 30 mm gap | 0.15 | 0.05 | 0.03 | 0.03 | 0.02 | 0.02 | 0.02 |
double_glazing_10mm | Double glazing, 2–3 mm glass, 10 mm gap | 0.1 | 0.07 | 0.05 | 0.03 | 0.02 | 0.02 | 0.02 |
double_glazing_inside | Double glazing, lead on the inside | 0.15 | 0.3 | 0.18 | 0.1 | 0.05 | 0.05 |
Wood¶
keyword | description | 125 Hz | 250 Hz | 500 Hz | 1 kHz | 2 kHz | 4 kHz | 8 kHz |
---|---|---|---|---|---|---|---|---|
wood_1.6cm | Wood, 1.6 cm thick, on 4 cm wooden planks | 0.18 | 0.12 | 0.1 | 0.09 | 0.08 | 0.07 | 0.07 |
plywood_thin | Thin plywood panelling | 0.42 | 0.21 | 0.1 | 0.08 | 0.06 | 0.06 | |
wood_16mm | 16 mm wood on 40 mm studs | 0.18 | 0.12 | 0.1 | 0.09 | 0.08 | 0.07 | 0.07 |
audience_floor | Audience floor, 2 layers, 33 mm on sleepers over concrete | 0.09 | 0.06 | 0.05 | 0.05 | 0.05 | 0.04 | |
stage_floor | Wood, stage floor, 2 layers, 27 mm over airspace | 0.1 | 0.07 | 0.06 | 0.06 | 0.06 | 0.06 | |
wooden_door | Solid wooden door | 0.14 | 0.1 | 0.06 | 0.08 | 0.1 | 0.1 | 0.10 |
Floor coverings¶
keyword | description | 125 Hz | 250 Hz | 500 Hz | 1 kHz | 2 kHz | 4 kHz | 8 kHz |
---|---|---|---|---|---|---|---|---|
linoleum_on_concrete | Linoleum, asphalt, rubber, or cork tile on concrete | 0.02 | 0.03 | 0.03 | 0.03 | 0.03 | 0.02 | |
carpet_cotton | Cotton carpet | 0.07 | 0.31 | 0.49 | 0.81 | 0.66 | 0.54 | 0.48 |
carpet_tufted_9.5mm | Loop pile tufted carpet, 1.4 kg/m2, 9.5 mm pile height: On hair pad, 3.0 kg/m2 | 0.1 | 0.4 | 0.62 | 0.7 | 0.63 | 0.88 | |
carpet_thin | Thin carpet, cemented to concrete | 0.02 | 0.04 | 0.08 | 0.2 | 0.35 | 0.4 | |
carpet_6mm_closed_cell_foam | 6 mm pile carpet bonded to closed-cell foam underlay | 0.03 | 0.09 | 0.25 | 0.31 | 0.33 | 0.44 | 0.44 |
carpet_6mm_open_cell_foam | 6 mm pile carpet bonded to open-cell foam underlay | 0.03 | 0.09 | 0.2 | 0.54 | 0.7 | 0.72 | 0.72 |
carpet_tufted_9m | 9 mm tufted pile carpet on felt underlay | 0.08 | 0.08 | 0.3 | 0.6 | 0.75 | 0.8 | 0.80 |
felt_5mm | Needle felt 5 mm stuck to concrete | 0.02 | 0.02 | 0.05 | 0.15 | 0.3 | 0.4 | 0.40 |
carpet_soft_10mm | 10 mm soft carpet on concrete | 0.09 | 0.08 | 0.21 | 0.26 | 0.27 | 0.37 | |
carpet_hairy | Hairy carpet on 3 mm felt | 0.11 | 0.14 | 0.37 | 0.43 | 0.27 | 0.25 | 0.25 |
carpet_rubber_5mm | 5 mm rubber carpet on concrete | 0.04 | 0.04 | 0.08 | 0.12 | 0.1 | 0.1 | |
carpet_1.35_kg_m2 | Carpet 1.35 kg/m2, on hair felt or foam rubber | 0.08 | 0.24 | 0.57 | 0.69 | 0.71 | 0.73 | |
cocos_fibre_roll_29mm | Cocos fibre roll felt, 29 mm thick (unstressed), reverse side clad with paper, 2.2 kg/m2, 2 Rayl | 0.1 | 0.13 | 0.22 | 0.35 | 0.47 | 0.57 |
Curtains¶
keyword | description | 125 Hz | 250 Hz | 500 Hz | 1 kHz | 2 kHz | 4 kHz | 8 kHz |
---|---|---|---|---|---|---|---|---|
curtains_cotton_0.5 | Cotton curtains (0.5 kg/m2) draped to 3/4 area approx. 130 mm from wall | 0.3 | 0.45 | 0.65 | 0.56 | 0.59 | 0.71 | 0.71 |
curtains_0.2 | Curtains (0.2 kg/m2) hung 90 mm from wall | 0.05 | 0.06 | 0.39 | 0.63 | 0.7 | 0.73 | 0.73 |
curtains_cotton_0.33 | Cotton cloth (0.33 kg/m2) folded to 7/8 area | 0.03 | 0.12 | 0.15 | 0.27 | 0.37 | 0.42 | |
curtains_densely_woven | Densely woven window curtains 90 mm from wall | 0.06 | 0.1 | 0.38 | 0.63 | 0.7 | 0.73 | |
blinds_half_open | Vertical blinds, 15 cm from wall, half opened (45°) | 0.03 | 0.09 | 0.24 | 0.46 | 0.79 | 0.76 | |
blinds_open | Vertical blinds, 15 cm from wall, open (90°) | 0.03 | 0.06 | 0.13 | 0.28 | 0.49 | 0.56 | |
curtains_velvet | Tight velvet curtains | 0.05 | 0.12 | 0.35 | 0.45 | 0.38 | 0.36 | 0.36 |
curtains_fabric | Curtain fabric, 15 cm from wall | 0.1 | 0.38 | 0.63 | 0.52 | 0.55 | 0.65 | |
curtains_fabric_folded | Curtain fabric, folded, 15 cm from wall | 0.12 | 0.6 | 0.98 | 1 | 1 | 1 | 1.00 |
curtains_glass_mat | Curtains of close-woven glass mat hung 50 mm from wall | 0.03 | 0.03 | 0.15 | 0.4 | 0.5 | 0.5 | 0.50 |
studio_curtains | Studio curtains, 22 cm from wall | 0.36 | 0.26 | 0.51 | 0.45 | 0.62 | 0.76 |
Seating (2 seats per m2)¶
keyword | description | 125 Hz | 250 Hz | 500 Hz | 1 kHz | 2 kHz | 4 kHz | 8 kHz |
---|---|---|---|---|---|---|---|---|
chairs_wooden | Wooden chairs without cushion | 0.05 | 0.08 | 0.1 | 0.12 | 0.12 | 0.12 | |
chairs_unoccupied_plastic | Unoccupied plastic chairs | 0.06 | 0.1 | 0.1 | 0.2 | 0.3 | 0.2 | 0.20 |
chairs_medium_upholstered | Medium upholstered concert chairs, empty | 0.49 | 0.66 | 0.8 | 0.88 | 0.82 | 0.7 | |
chairs_heavy_upholstered | Heavily upholstered seats, unoccupied | 0.7 | 0.76 | 0.81 | 0.84 | 0.84 | 0.81 | |
chairs_upholstered_cloth | Empty chairs, upholstered with cloth cover | 0.44 | 0.6 | 0.77 | 0.89 | 0.82 | 0.7 | 0.70 |
chairs_upholstered_leather | Empty chairs, upholstered with leather cover | 0.4 | 0.5 | 0.58 | 0.61 | 0.58 | 0.5 | 0.50 |
chairs_upholstered_moderate | Unoccupied, moderately upholstered chairs (0.90 m × 0.55 m) | 0.44 | 0.56 | 0.67 | 0.74 | 0.83 | 0.87 |
Audience (unless not specified explicitly, 2 persons per m2)¶
keyword | description | 125 Hz | 250 Hz | 500 Hz | 1 kHz | 2 kHz | 4 kHz | 8 kHz |
---|---|---|---|---|---|---|---|---|
audience_orchestra_choir | Areas with audience, orchestra or choir including narrow aisles | 0.6 | 0.74 | 0.88 | 0.96 | 0.93 | 0.85 | 0.85 |
audience_wooden_chairs_1_m2 | Audience on wooden chairs, 1 per m2 | 0.16 | 0.24 | 0.56 | 0.69 | 0.81 | 0.78 | 0.78 |
audience_wooden_chairs_2_m2 | Audience on wooden chairs, 2 per m2 | 0.24 | 0.4 | 0.78 | 0.98 | 0.96 | 0.87 | 0.87 |
orchestra_1.5_m2 | Orchestra with instruments on podium, 1.5 m2 per person | 0.27 | 0.53 | 0.67 | 0.93 | 0.87 | 0.8 | 0.80 |
audience_0.72_m2 | Audience area, 0.72 persons / m2 | 0.1 | 0.21 | 0.41 | 0.65 | 0.75 | 0.71 | |
audience_1_m2 | Audience area, 1 person / m2 | 0.16 | 0.29 | 0.55 | 0.8 | 0.92 | 0.9 | |
audience_1.5_m2 | Audience area, 1.5 persons / m2 | 0.22 | 0.38 | 0.71 | 0.95 | 0.99 | 0.99 | |
audience_2_m2 | Audience area, 2 persons / m2 | 0.26 | 0.46 | 0.87 | 0.99 | 0.99 | 0.99 | |
audience_upholstered_chairs_1 | Audience in moderately upholstered chairs 0,85 m × 0,63 m | 0.72 | 0.82 | 0.91 | 0.93 | 0.94 | 0.87 | |
audience_upholstered_chairs_2 | Audience in moderately upholstered chairs 0,90 m × 0,55 m | 0.55 | 0.86 | 0.83 | 0.87 | 0.9 | 0.87 |
Wall absorbers¶
keyword | description | 125 Hz | 250 Hz | 500 Hz | 1 kHz | 2 kHz | 4 kHz | 8 kHz |
---|---|---|---|---|---|---|---|---|
panel_fabric_covered_6pcf | Fabric-covered panel, 6 pcf rockwool core | 0.46 | 0.93 | 1 | 1 | 1 | 1 | 1.00 |
panel_fabric_covered_8pcf | Fabric-covered panel, 8 pcf rockwool core | 0.21 | 0.66 | 1 | 1 | 0.97 | 0.98 | 0.98 |
facing_brick | Facing-brick brickwork, open butt joins, brick dimensions 230 × 50 × 55 mm | 0.04 | 0.14 | 0.49 | 0.35 | 0.31 | 0.36 | |
acoustical_plaster_25mm | Acoustical plaster, approx. 25 mm thick, 3.5 kg/m2/cm | 0.17 | 0.36 | 0.66 | 0.65 | 0.62 | 0.68 | |
rockwool_50mm_80kgm3 | Rockwool thickness = 50 mm, 80 kg/m3 | 0.22 | 0.6 | 0.92 | 0.9 | 0.88 | 0.88 | 0.88 |
rockwool_50mm_40kgm3 | Rockwool thickness = 50 mm, 40 kg/m3 | 0.23 | 0.59 | 0.86 | 0.86 | 0.86 | 0.86 | 0.86 |
mineral_wool_50mm_40kgm3 | 50 mm mineral wool (40 kg/m3), glued to wall, untreated surface | 0.15 | 0.7 | 0.6 | 0.6 | 0.85 | 0.9 | 0.90 |
mineral_wool_50mm_70kgm3 | 50 mm mineral wool (70 kg/m3) 300 mm in front of wall | 0.7 | 0.45 | 0.65 | 0.6 | 0.75 | 0.65 | 0.65 |
gypsum_board | Gypsum board, perforation 19.6%, hole diameter 15 mm, backed by fibrous web 12 Rayl, 100 mm cavity filled with mineral fibre mat 1,05 kg/m2, 7,5 Rayl | 0.3 | 0.69 | 1 | 0.81 | 0.66 | 0.62 | |
perforated_veneered_chipboard | Perforated veneered chipboard, 50 mm, 1 mm holes, 3 mm spacing, 9% hole surface ratio, 150 mm cavity filled with 30 mm mineral wool | 0.41 | 0.67 | 0.58 | 0.59 | 0.68 | 0.35 | |
fibre_absorber_1 | Fibre absorber, mineral fibre, 20 mm thick, 3.4 kg/m2, 50 mm cavity | 0.2 | 0.56 | 0.82 | 0.87 | 0.7 | 0.53 | |
fibre_absorber_2 | Fibre absorber, mats of porous flexible fibrous web fabric, self-extinguishing | 0.07 | 0.07 | 0.2 | 0.41 | 0.75 | 0.97 |
Ceiling absorbers¶
keyword | description | 125 Hz | 250 Hz | 500 Hz | 1 kHz | 2 kHz | 4 kHz | 8 kHz |
---|---|---|---|---|---|---|---|---|
ceiling_plasterboard | Plasterboard ceiling on battens with large air-space above | 0.2 | 0.15 | 0.1 | 0.08 | 0.04 | 0.02 | |
ceiling_fibre_abosrber | Fibre absorber on perforated sheet metal cartridge, 0,5 mm zinc-plated steel, 1.5 mm hole diameter, 200 mm cavity filled with 20 mm mineral wool (20 kg/m3), inflammable | 0.48 | 0.97 | 1 | 0.97 | 1 | 1 | 1.00 |
ceiling_fissured_tile | Fissured ceiling tile | 0.49 | 0.53 | 0.53 | 0.75 | 0.92 | 0.99 | |
ceiling_perforated_gypsum_board | Perforated 27 mm gypsum board (16%), d = 4,5 mm, 300 mm from ceiling | 0.45 | 0.55 | 0.6 | 0.9 | 0.86 | 0.75 | |
ceiling_melamine_foam | Wedge-shaped, melamine foam, ceiling tile | 0.12 | 0.33 | 0.83 | 0.97 | 0.98 | 0.95 | |
ceiling_metal_panel | Metal panel ceiling, backed by 20 mm Sillan acoustic tiles, panel width 85 mm, panel spacing 15 mm, cavity 35 cm | 0.59 | 0.8 | 0.82 | 0.65 | 0.27 | 0.23 |
Special absorbers¶
keyword | description | 125 Hz | 250 Hz | 500 Hz | 1 kHz | 2 kHz | 4 kHz | 8 kHz |
---|---|---|---|---|---|---|---|---|
microperforated_foil_kaefer | Microperforated foil “Microsorber” (Kaefer) | 0.06 | 0.28 | 0.7 | 0.68 | 0.74 | 0.53 | |
microperforated_glass_sheets | Microperforated glass sheets, 5 mm cavity | 0.1 | 0.45 | 0.85 | 0.3 | 0.1 | 0.05 | |
hanging_absorber_panels_1 | Hanging absorber panels (foam), 400 mm depth, 400 mm distance | 0.25 | 0.45 | 0.8 | 0.9 | 0.85 | 0.8 | |
hanging_absorber_panels_2 | Hanging absorber panels (foam), 400 mm depth, 700 mm distance | 0.2 | 0.3 | 0.6 | 0.75 | 0.7 | 0.7 |
Scattering Coefficients¶
Diffusers¶
keyword | description | 125 Hz | 250 Hz | 500 Hz | 1 kHz | 2 kHz | 4 kHz | 8 kHz |
---|---|---|---|---|---|---|---|---|
rpg_skyline | Diffuser RPG Skyline | 0.01 | 0.08 | 0.45 | 0.82 | 1 | ||
rpg_qrd | Diffuser RPG QRD | 0.06 | 0.15 | 0.45 | 0.95 | 0.88 | 0.91 |
Seating and audience¶
keyword | description | 125 Hz | 250 Hz | 500 Hz | 1 kHz | 2 kHz | 4 kHz | 8 kHz |
---|---|---|---|---|---|---|---|---|
theatre_audience | Theatre Audience | 0.3 | 0.5 | 0.6 | 0.6 | 0.7 | 0.70 | 0.70 |
classroom_tables | Rows of classroom tables and persons on chairs | 0.2 | 0.3 | 0.4 | 0.5 | 0.5 | 0.60 | 0.60 |
amphitheatre_steps | Amphitheatre steps, length 82 cm, height 30 cm (Farnetani 2005) | 0.05 | 0.45 | 0.75 | 0.9 | 0.9 |
Round Robin III – wall and ceiling¶
keyword | description | 125 Hz | 250 Hz | 500 Hz | 1 kHz | 2 kHz | 4 kHz | 8 kHz |
---|---|---|---|---|---|---|---|---|
rect_prism_boxes | Rectangularandprismboxes (studio wall) “Round Robin III” (after (Bork 2005a)) | 0.5 | 0.9 | 0.95 | 0.95 | 0.95 | 0.95 | |
trapezoidal_boxes | Trapezoidal boxes (studio ceiling) “Round Robin III” (after (Bork 2005a)) | 0.13 | 0.56 | 0.95 | 0.95 | 0.95 | 0.95 |
Transforms¶
Module contents¶
Block based processing¶
This subpackage contains routines for continuous realtime-like block processing of signals with the STFT. The routines are written to take advantage of fast FFT libraries like pyfftw or mkl when available.
- DFT
- A class for performing DFT of real signals
- STFT
- A class for continuous STFT processing and frequency domain filtering
-
pyroomacoustics.transform.
analysis
(*args, **kwargs)¶
-
pyroomacoustics.transform.
compute_synthesis_window
(*args, **kwargs)¶
-
pyroomacoustics.transform.
synthesis
(*args, **kwargs)¶
Algorithms¶
DFT¶
Class for performing the Discrete Fourier Transform (DFT) and inverse DFT for real signals, including multichannel. It is also possible to specific an analysis or synthesis window.
When available, it is possible to use the pyfftw
or mkl_fft
packages.
Otherwise the default is to use numpy.fft.rfft
/numpy.fft.irfft
.
More on pyfftw
can be read here:
https://pyfftw.readthedocs.io/en/latest/index.html
More on mkl_fft
can be read here:
https://github.com/IntelPython/mkl_fft
-
class
pyroomacoustics.transform.dft.
DFT
(nfft, D=1, analysis_window=None, synthesis_window=None, transform='numpy', axis=0, precision='double', bits=None)¶ Bases:
object
Class for performing the Discrete Fourier Transform (DFT) of real signals.
-
X
¶ Real DFT computed by
analysis
.Type: numpy array
-
x
¶ IDFT computed by
synthesis
.Type: numpy array
-
nfft
¶ FFT size.
Type: int
-
D
¶ Number of channels.
Type: int
-
transform
¶ Which FFT package will be used.
Type: str
-
analysis_window
¶ Window to be applied before DFT.
Type: numpy array
-
synthesis_window
¶ Window to be applied after inverse DFT.
Type: numpy array
-
axis
¶ Axis over which to compute the FFT.
Type: int
-
precision, bits
How many precision bits to use for the input. Twice the amount will be used for complex spectrum.
Type: string, np.float32, np.float64, np.complex64, np.complex128
Parameters: - nfft (int) – FFT size.
- D (int, optional) – Number of channels. Default is 1.
- analysis_window (numpy array, optional) – Window to be applied before DFT. Default is no window.
- synthesis_window (numpy array, optional) – Window to be applied after inverse DFT. Default is no window.
- transform (str, optional) – which FFT package to use:
numpy
,pyfftw
, ormkl
. Default isnumpy
. - axis (int, optional) – Axis over which to compute the FFT. Default is first axis.
- precision (string, np.float32, np.float64, np.complex64, np.complex128, optional) – How many precision bits to use for the input. If ‘single’/np.float32/np.complex64, 32 bits for real inputs or 64 for complex spectrum. Otherwise, cast to 64 bits for real inputs or 128 for complex spectrum (default).
-
analysis
(x)¶ Perform frequency analysis of a real input using DFT.
Parameters: x (numpy array) – Real signal in time domain. Returns: DFT of input. Return type: numpy array
-
synthesis
(X=None)¶ Perform time synthesis of frequency domain to real signal using the inverse DFT.
Parameters: X (numpy array, optional) – Complex signal in frequency domain. Default is to use DFT computed from analysis
.Returns: IDFT of self.X
or input if given.Return type: numpy array
-
STFT¶
Class for real-time STFT analysis and processing.
-
class
pyroomacoustics.transform.stft.
STFT
(N, hop=None, analysis_window=None, synthesis_window=None, channels=1, transform='numpy', streaming=True, precision='double', **kwargs)¶ Bases:
object
A class for STFT processing.
Parameters: - N (int) – number of samples per frame
- hop (int) – hop size
- analysis_window (numpy array) – window applied to block before analysis
- synthesis_window (numpy array) – window applied to the block before synthesis
- channels (int) – number of signals
- transform (str, optional) – which FFT package to use: ‘numpy’ (default), ‘pyfftw’, or ‘mkl’
- streaming (bool, optional) – whether (True, default) or not (False) to “stitch” samples between repeated calls of ‘analysis’ and ‘synthesis’ if we are receiving a continuous stream of samples.
- num_frames (int, optional) –
Number of frames to be processed. If set, this will be strictly enforced as the STFT block will allocate memory accordingly. If not set, there will be no check on the number of frames sent to analysis/process/synthesis
- NOTE:
- 1) num_frames = 0, corresponds to a “real-time” case in which each input block corresponds to [hop] samples. 2) num_frames > 0, requires [(num_frames-1)*hop + N] samples as the last frame must contain [N] samples.
- precision (string, np.float32, np.float64, np.complex64, np.complex128, optional) – How many precision bits to use for the input. If ‘single’/np.float32/np.complex64, 32 bits for real inputs or 64 for complex spectrum. Otherwise, cast to 64 bits for real inputs or 128 for complex spectrum (default).
-
analysis
(x)¶ Parameters: x (2D numpy array, [samples, channels]) – Time-domain signal.
-
process
(X=None)¶ Parameters: X (numpy array) – X can take on multiple shapes: 1) (N,) if it is single channel and only one frame 2) (N,D) if it is multi-channel and only one frame 3) (F,N) if it is single channel but multiple frames 4) (F,N,D) if it is multi-channel and multiple frames Returns: x_r – Reconstructed time-domain signal. Return type: numpy array
-
reset
()¶ Reset state variables. Necessary after changing or setting the filter or zero padding.
-
set_filter
(coeff, zb=None, zf=None, freq=False)¶ Set time-domain FIR filter with appropriate zero-padding. Frequency spectrum of the filter is computed and set for the object. There is also a check for sufficient zero-padding.
Parameters: - coeff (numpy array) – Filter in time domain.
- zb (int) – Amount of zero-padding added to back/end of frame.
- zf (int) – Amount of zero-padding added to front/beginning of frame.
- freq (bool) – Whether or not given coefficients (coeff) are in the frequency domain.
-
synthesis
(X=None)¶ Parameters: X (numpy array of frequency content) – X can take on multiple shapes: 1) (N,) if it is single channel and only one frame 2) (N,D) if it is multi-channel and only one frame 3) (F,N) if it is single channel but multiple frames 4) (F,N,D) if it is multi-channel and multiple frames where: - F is the number of frames - N is the number of frequency bins - D is the number of channels Returns: x_r – Reconstructed time-domain signal. Return type: numpy array
-
zero_pad_back
(zb)¶ Set zero-padding at end of frame.
-
zero_pad_front
(zf)¶ Set zero-padding at beginning of frame.
-
pyroomacoustics.transform.stft.
analysis
(x, L, hop, win=None, zp_back=0, zp_front=0)¶ Convenience function for one-shot STFT
Parameters: - x (array_like, (n_samples) or (n_samples, n_channels)) – input signal
- L (int) – frame size
- hop (int) – shift size between frames
- win (array_like) – the window to apply (default None)
- zp_back (int) – zero padding to apply at the end of the frame
- zp_front (int) – zero padding to apply at the beginning of the frame
Returns: X – The STFT of x
Return type: ndarray, (n_frames, n_frequencies) or (n_frames, n_frequencies, n_channels)
-
pyroomacoustics.transform.stft.
compute_synthesis_window
(analysis_window, hop)¶ Computes the optimal synthesis window given an analysis window and hop (frame shift). The procedure is described in
D. Griffin and J. Lim, Signal estimation from modified short-time Fourier transform, IEEE Trans. Acoustics, Speech, and Signal Process., vol. 32, no. 2, pp. 236-243, 1984.
Parameters: - analysis_window (array_like) – The analysis window
- hop (int) – The frame shift
-
pyroomacoustics.transform.stft.
synthesis
(X, L, hop, win=None, zp_back=0, zp_front=0)¶ Convenience function for one-shot inverse STFT
Parameters: - X (array_like (n_frames, n_frequencies) or (n_frames, n_frequencies, n_channels)) – The data
- L (int) – frame size
- hop (int) – shift size between frames
- win (array_like) – the window to apply (default None)
- zp_back (int) – zero padding to apply at the end of the frame
- zp_front (int) – zero padding to apply at the beginning of the frame
Returns: x – The inverse STFT of X
Return type: ndarray, (n_samples) or (n_samples, n_channels)
Dataset Wrappers¶
Module contents¶
The Datasets Sub-package is responsible to deliver wrappers around a few popular audio datasets to make them easier to use.
Two base class pyroomacoustics.datasets.base.Dataset
and
pyroomacoustics.datasets.base.Sample
wrap
together the audio samples and their meta data.
The general idea is to create a sample object with an attribute
containing all metadata. Dataset objects that have a collection
of samples can then be created and can be filtered according
to the values in the metadata.
Many of the functions with match
or filter
will take an
arbitrary number of keyword arguments. The keys should match some
metadata in the samples. Then there are three ways that match occurs
between a key/value
pair and an attribute
sharing the same key.
value == attribute
value
is a list andattribute in value == True
value
is a callable (a function) andvalue(attribute) == True
Example 1¶
# Prepare a few artificial samples
samples = [
{
'data' : 0.99,
'metadata' : { 'speaker' : 'alice', 'sex' : 'female', 'age' : 37, 'number' : 'one' },
},
{
'data' : 2.1,
'metadata' : { 'speaker' : 'alice', 'sex' : 'female', 'age' : 37, 'number' : 'two' },
},
{
'data' : 1.02,
'metadata' : { 'speaker' : 'bob', 'sex' : 'male', 'age' : 48, 'number' : 'one' },
},
{
'data' : 2.07,
'metadata' : { 'speaker' : 'bob', 'sex' : 'male', 'age' : 48, 'number' : 'two' },
},
]
corpus = Dataset()
for s in samples:
new_sample = Sample(s['data'], **s['metadata'])
corpus.add_sample(new_sample)
# Then, it possible to display summary info about the corpus
print(corpus)
# The number of samples in the corpus is given by ``len``
print('Number of samples:', len(corpus))
# And we can access samples with the slice operator
print('Sample #2:')
print(corpus[2]) # (shortcut for `corpus.samples[2]`)
# We can obtain a new corpus with only male subject
corpus_male_only = corpus.filter(sex='male')
print(corpus_male_only)
# Only retain speakers above 40 years old
corpus_older = corpus.filter(age=lambda a : a > 40)
print(corpus_older)
Example 2 (CMU ARCTIC)¶
# This example involves the CMU ARCTIC corpus available at
# http://www.festvox.org/cmu_arctic/
import matplotlib.pyplot as plt
import pyroomacoustics as pra
# Here, the corpus for speaker bdl is automatically downloaded
# if it is not available already
corpus = pra.datasets.CMUArcticCorpus(download=True, speaker=['bdl'])
# print dataset info and 10 sentences
print(corpus)
corpus.head(n=10)
# let's extract all samples containing the word 'what'
keyword = 'what'
matches = corpus.filter(text=lambda t : keyword in t)
print('The number of sentences containing "{}": {}'.format(keyword, len(matches)))
for s in matches.sentences:
print(' *', s)
# if the sounddevice package is available, we can play the sample
matches[0].play()
# show the spectrogram
plt.figure()
matches[0].plot()
plt.show()
Example 3 (Google’s Speech Commands Dataset)¶
# This example involves Google's Speech Commands Dataset available at
# https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html
import matplotlib.pyplot as plt
import pyroomacoustics as pra
# The dataset is automatically downloaded if not available and 10 of each word is selected
dataset = pra.datasets.GoogleSpeechCommands(download=True, subset=10, seed=0)
# print dataset info, first 10 entries, and all sounds
print(dataset)
dataset.head(n=10)
print("All sounds in the dataset:")
print(dataset.classes)
# filter by specific word
selected_word = 'yes'
matches = dataset.filter(word=selected_word)
print("Number of '%s' samples : %d" % (selected_word, len(matches)))
# if the sounddevice package is available, we can play the sample
matches[0].play()
# show the spectrogram
plt.figure()
matches[0].plot()
plt.show()
Datasets Available¶
CMU ARCTIC Corpus¶
The CMU ARCTIC Dataset¶
The CMU_ARCTIC databases were constructed at the Language Technologies Institute at Carnegie Mellon University as phonetically balanced, US English single speaker databases designed for unit selection speech synthesis research. A detailed report on the structure and content of the database and the recording environment etc is available as a Carnegie Mellon University, Language Technologies Institute Tech Report CMU-LTI-03-177 and is also available here.
The databases consist of around 1150 utterances carefully selected from out-of-copyright texts from Project Gutenberg. The databses include US English male (bdl) and female (slt) speakers (both experinced voice talent) as well as other accented speakers.
The 1132 sentence prompt list is available from cmuarctic.data
The distributions include 16KHz waveform and simultaneous EGG signals. Full phoentically labelling was perfromed by the CMU Sphinx using the FestVox based labelling scripts. Complete runnable Festival Voices are included with the database distributions, as examples though better voices can be made by improving labelling etc.
License: Permissive, attribution required
Price: Free
URL: http://www.festvox.org/cmu_arctic/
-
class
pyroomacoustics.datasets.cmu_arctic.
CMUArcticCorpus
(basedir=None, download=False, build=True, **kwargs)¶ Bases:
pyroomacoustics.datasets.base.Dataset
This class will load the CMU ARCTIC corpus in a structure amenable to be processed.
-
basedir
¶ The directory where the CMU ARCTIC corpus is located/downloaded. By default, this is the current directory.
Type: str, option
-
info
¶ A dictionary whose keys are the labels of metadata fields attached to the samples. The values are lists of all distinct values the field takes.
Type: dict
-
sentences
¶ The list of all utterances in the corpus
Type: list of CMUArcticSentence
Parameters: - basedir (str, optional) – The directory where the CMU ARCTIC corpus is located/downloaded. By default, this is the current directory.
- download (bool, optional) – If the corpus does not exist, download it.
- speaker (str or list of str, optional) – A list of the CMU ARCTIC speakers labels. If provided, only those speakers are loaded. By default, all speakers are loaded.
- sex (str or list of str, optional) – Can be ‘female’ or ‘male’
- lang (str or list of str, optional) – The language, only ‘English’ is available here
- accent (str of list of str, optional) – The accent of the speaker
-
build_corpus
(**kwargs)¶ Build the corpus with some filters (sex, lang, accent, sentence_tag, sentence)
-
filter
(**kwargs)¶ Filter the corpus and selects samples that match the criterias provided The arguments to the keyword can be 1) a string, 2) a list of strings, 3) a function. There is a match if one of the following is True.
value == attribute
value
is a list andattribute in value == True
value
is a callable (a function) andvalue(attribute) == True
-
-
class
pyroomacoustics.datasets.cmu_arctic.
CMUArcticSentence
(path, **kwargs)¶ Bases:
pyroomacoustics.datasets.base.AudioSample
Create the sentence object
Parameters: - path (str) – the path to the audio file
- **kwargs – metadata as a list of keyword arguments
-
data
¶ The actual audio signal
Type: array_like
-
fs
¶ sampling frequency
Type: int
-
plot
(**kwargs)¶ Plot the spectrogram
Google Speech Commands¶
Google’s Speech Commands Dataset¶
The Speech Commands Dataset has 65,000 one-second long utterances of 30 short words, by thousands of different people, contributed by members of the public through the AIY website. It’s released under a Creative Commons BY 4.0 license.
More info about the dataset can be found at the link below:
https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html
AIY website for contributing recordings:
https://aiyprojects.withgoogle.com/open_speech_recording
Tutorial on creating a word classifier:
https://www.tensorflow.org/versions/master/tutorials/audio_recognition
-
class
pyroomacoustics.datasets.google_speech_commands.
GoogleSample
(path, **kwargs)¶ Bases:
pyroomacoustics.datasets.base.AudioSample
Create the sound object.
Parameters: - path (str) – the path to the audio file
- **kwargs – metadata as a list of keyword arguments
-
data
¶ the actual audio signal
Type: array_like
-
fs
¶ sampling frequency
Type: int
-
plot
(**kwargs)¶ Plot the spectogram
-
class
pyroomacoustics.datasets.google_speech_commands.
GoogleSpeechCommands
(basedir=None, download=False, build=True, subset=None, seed=0, **kwargs)¶ Bases:
pyroomacoustics.datasets.base.Dataset
This class will load the Google Speech Commands Dataset in a structure that is convenient to be processed.
-
basedir
¶ The directory where the Speech Commands Dataset is located/downloaded.
Type: str
-
size_by_samples
¶ A dictionary whose keys are the words in the dataset. The values are the number of occurances for that particular word.
Type: dict
-
subdirs
¶ The list of subdirectories in
basedir
, where each sound type is the name of a subdirectory.Type: list
-
classes
¶ The list of all sounds, same as the keys of
size_by_samples
.Type: list
Parameters: - basedir (str, optional) – The directory where the Google Speech Commands dataset is located/downloaded. By default, this is the current directory.
- download (bool, optional) – If the corpus does not exist, download it.
- build (bool, optional) – Whether or not to build the dataset. By default, it is.
- subset (int, optional) – Build a dataset that contains all noise samples and subset samples per word. By default, the dataset will be built with all samples.
- seed (int, optional) – Which seed to use for the random generator when selecting a subset of samples. By default,
seed=0
.
-
build_corpus
(subset=None, **kwargs)¶ Build the corpus with some filters (speech or not speech, sound type).
-
filter
(**kwargs)¶ Filter the dataset and select samples that match the criterias provided The arguments to the keyword can be 1) a string, 2) a list of strings, 3) a function. There is a match if one of the following is True.
value == attribute
value
is a list andattribute in value == True
value
is a callable (a function) andvalue(attribute) == True
-
TIMIT Corpus¶
The TIMIT Dataset¶
The TIMIT corpus of read speech is designed to provide speech data for acoustic-phonetic studies and for the development and evaluation of automatic speech recognition systems. TIMIT contains broadband recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences. The TIMIT corpus includes time-aligned orthographic, phonetic and word transcriptions as well as a 16-bit, 16kHz speech waveform file for each utterance. Corpus design was a joint effort among the Massachusetts Institute of Technology (MIT), SRI International (SRI) and Texas Instruments, Inc. (TI). The speech was recorded at TI, transcribed at MIT and verified and prepared for CD-ROM production by the National Institute of Standards and Technology (NIST).
The TIMIT corpus transcriptions have been hand verified. Test and training subsets, balanced for phonetic and dialectal coverage, are specified. Tabular computer-searchable information is included as well as written documentation.
Unfortunately, this is a proprietary dataset. A licensed can be obtained for $125 to $250 depending on your status (academic or otherwise).
Deprecation Warning: The interface of TimitCorpus will change in the near
future to match that of pyroomacoustics.datasets.cmu_arctic.CMUArcticCorpus
URL: https://catalog.ldc.upenn.edu/ldc93s1
-
class
pyroomacoustics.datasets.timit.
Sentence
(path)¶ Bases:
object
Create the sentence object
Parameters: path ((string)) – the path to the particular sample -
speaker
¶ Speaker initials
Type: str
-
id
¶ a digit to disambiguate identical initials
Type: str
-
sex
¶ Speaker gender (M or F)
Type: str
-
dialect
¶ Speaker dialect region number:
- New England
- Northern
- North Midland
- South Midland
- Southern
- New York City
- Western
- Army Brat (moved around)
Type: str
-
fs
¶ sampling frequency
Type: int
-
samples
¶ the audio track
Type: array_like (n_samples,)
-
text
¶ the text of the sentence
Type: str
-
words
¶ list of Word objects forming the sentence
Type: list
-
phonems
¶ List of phonems contained in the sentence. Each element is a dictionnary containing a ‘bnd’ with the limits of the phonem and ‘name’ that is the phonem transcription.
Type: list
-
play
()¶ Play the sound sample
-
plot
(L=512, hop=128, zpb=0, phonems=False, **kwargs)¶
-
-
class
pyroomacoustics.datasets.timit.
TimitCorpus
(basedir)¶ Bases:
object
TimitCorpus class
Parameters: - basedir ((string)) – The location of the TIMIT database
- directories ((list of strings)) – The subdirectories containing the data ([‘TEST’,’TRAIN’])
- sentence_corpus ((dict)) – A dictionnary that contains a list of Sentence objects for each sub-directory
- word_corpus ((dict)) – A dictionnary that contains a list of Words objects for each sub-directory and word available in the corpus
-
build_corpus
(sentences=None, dialect_region=None, speakers=None, sex=None)¶ Build the corpus
The TIMIT database structure is encoded in the directory sturcture:
- basedir
- TEST/TRAIN
- Regional accent index (1 to 8)
- Speakers (one directory per speaker)
- Sentences (one file per sentence)
Parameters: - sentences ((list)) – A list containing the sentences to which we want to restrict the corpus Example: sentences=[‘SA1’,’SA2’]
- dialect_region ((list of int)) – A list to which we restrict the dialect regions Example: dialect_region=[1, 4, 5]
- speakers ((list)) – A list of speakers acronym to which we want to restrict the corpus Example: speakers=[‘AKS0’]
- sex ((string)) – Restrict to a single sex: ‘F’ for female, ‘M’ for male
-
get_word
(d, w, index=0)¶ return instance index of word w from group (test or train) d
-
class
pyroomacoustics.datasets.timit.
Word
(word, boundaries, data, fs, phonems=None)¶ Bases:
object
A class used for words of the TIMIT corpus
-
word
¶ The spelling of the word
Type: str
-
boundaries
¶ The limits of the word within the sentence
Type: list
-
samples
¶ A view on the sentence samples containing the word
Type: array_like
-
fs
¶ The sampling frequency
Type: int
-
phonems
¶ A list of phones contained in the word
Type: list
-
features
¶ A feature array (e.g. MFCC coefficients)
Type: array_like
Parameters: - word (str) – The spelling of the word
- boundaries (list) – The limits of the word within the sentence
- data (array_like) – The nd-array that contains all the samples of the sentence
- fs (int) – The sampling frequency
- phonems (list, optional) – A list of phones contained in the word
-
mfcc
(frame_length=1024, hop=512)¶ compute the mel-frequency cepstrum coefficients of the word samples
-
play
()¶ Play the sound sample
-
plot
()¶
-
Tools and Helpers¶
Base Class¶
Base class for some data corpus and the samples it contains.
-
class
pyroomacoustics.datasets.base.
AudioSample
(data, fs, **kwargs)¶ Bases:
pyroomacoustics.datasets.base.Sample
We add some methods specific to display and listen to audio samples. The sampling frequency of the samples is an extra parameter.
For multichannel audio, we assume the same format used by
`scipy.io.wavfile <https://docs.scipy.org/doc/scipy-0.14.0/reference/io.html#module-scipy.io.wavfile>`_
, that isdata
is then a 2D array with each column being a channel.-
data
¶ The actual data
Type: array_like
-
fs
¶ The sampling frequency of the input signal
Type: int
-
meta
¶ An object containing the sample metadata. They can be accessed using the dot operator
Type: pyroomacoustics.datasets.Meta
-
play
(**kwargs)¶ Play the sound sample. This function uses the sounddevice package for playback.
It takes the same keyword arguments as sounddevice.play.
-
plot
(NFFT=512, noverlap=384, **kwargs)¶ Plot the spectrogram of the audio sample.
It takes the same keyword arguments as matplotlib.pyplot.specgram.
-
-
class
pyroomacoustics.datasets.base.
Dataset
¶ Bases:
object
The base class for a data corpus. It has basically a list of samples and a filter function
-
samples
¶ A list of all the Samples in the dataset
Type: list
-
info
¶ This dictionary keeps track of all the fields in the metadata. The keys of the dictionary are the metadata field names. The values are again dictionaries, but with the keys being the possible values taken by the metadata and the associated value, the number of samples with this value in the corpus.
Type: dict
-
add_sample
(sample)¶ Add a sample to the Dataset and keep track of the metadata.
-
add_sample_matching
(sample, **kwargs)¶ The sample is added to the corpus only if all the keyword arguments match the metadata of the sample. The match is operated by
pyroomacoustics.datasets.Meta.match
.
-
filter
(**kwargs)¶ Filter the corpus and selects samples that match the criterias provided The arguments to the keyword can be 1) a string, 2) a list of strings, 3) a function. There is a match if one of the following is True.
value == attribute
value
is a list andattribute in value == True
value
is a callable (a function) andvalue(attribute) == True
-
head
(n=5)¶ Print n samples from the dataset
-
-
class
pyroomacoustics.datasets.base.
Meta
(**attr)¶ Bases:
object
A simple class that will take a dictionary as input and put the values in attributes named after the keys. We use it to store metadata for the samples
The parameters can be any set of keyword arguments. They will all be transformed into attribute of the object.
-
as_dict
()¶ Returns all the attribute/value pairs of the object as a dictionary
-
match
(**kwargs)¶ The key/value pairs given by the keyword arguments are compared to the attribute/value pairs of the object. If the values all match, True is returned. Otherwise False is returned. If a keyword argument has no attribute counterpart, an error is raised. Attributes that do not have a keyword argument counterpart are ignored.
There are three ways to match an attribute with keyword=value: 1.
value == attribute
2.value
is a list andattribute in value == True
3.value
is a callable (a function) andvalue(attribute) == True
-
-
class
pyroomacoustics.datasets.base.
Sample
(data, **kwargs)¶ Bases:
object
The base class for a dataset sample. The idea is that different corpus will have different attributes for the samples. They should at least have a data attribute.
-
data
¶ The actual data
Type: array_like
-
meta
¶ An object containing the sample metadata. They can be accessed using the dot operator
Type: pyroomacoustics.datasets.Meta
-
Dataset Utilities¶
-
pyroomacoustics.datasets.utils.
download_uncompress
(url, path='.', compression=None, context=None)¶ This functions download and uncompress on the fly a file of type tar, tar.gz, tar.bz2.
Parameters: - url (str) – The URL of the file
- path (str, optional) – The path where to uncompress the file
- compression (str, optional) – The compression type (one of ‘bz2’, ‘gz’, ‘tar’), infered from url if not provided
- context (SSL certification, optional) – Default is to use none.
Adaptive Filtering¶
Module contents¶
Adaptive Filter Algorithms¶
This sub-package provides implementations of popular adaptive filter algorithms.
- RLS
- Recursive Least Squares
- LMS
- Least Mean Squares and Normalized Least Mean Squares
All these classes derive from the base class
pyroomacoustics.adaptive.adaptive_filter.AdaptiveFilter
that offer
a generic way of running an adaptive filter.
The above classes are applicable for time domain processing. For frequency domain adaptive filtering, there is the SubbandLMS class. After using a DFT or STFT block, the SubbandLMS class can be used to used to apply LMS or NLMS to each frequency band. A shorter adaptive filter can be used on each band as opposed to the filter required in the time domain version. Roughly, a filter of M taps applied to each band (total of B) corresponds to a time domain filter with N = M x B taps.
How to use the adaptive filter module¶
First, an adaptive filter object is created and all the relevant options can be set (step size, regularization, etc). Then, the update function is repeatedly called to provide new samples to the algorithm.
# initialize the filter
rls = pyroomacoustics.adaptive.RLS(30)
# run the filter on a stream of samples
for i in range(100):
rls.update(x[i], d[i])
# the reconstructed filter is available
print('Reconstructed filter:', rls.w)
The SubbandLMS class has the same methods as the time domain approaches. However, the signal must be in the frequency domain. This can be done with the STFT block in the transform sub-package of pyroomacoustics.
# initialize STFT and SubbandLMS blocks
block_size = 128
stft_x = pra.transform.STFT(N=block_size,
hop=block_size//2,
analysis_window=pra.hann(block_size))
stft_d = pra.transform.STFT(N=block_size,
hop=block_size//2,
analysis_window=pra.hann(block_size))
nlms = pra.adaptive.SubbandLMS(num_taps=6,
num_bands=block_size//2+1, mu=0.5, nlms=True)
# preparing input and reference signals
...
# apply block-by-block
for n in range(num_blocks):
# obtain block
...
# to frequency domain
stft_x.analysis(x_block)
stft_d.analysis(d_block)
nlms.update(stft_x.X, stft_d.X)
# estimating input convolved with unknown response
y_hat = stft_d.synthesis(np.diag(np.dot(nlms.W.conj().T,stft_x.X)))
# AEC output
E = stft_d.X - np.diag(np.dot(nlms.W.conj().T,stft_x.X))
out = stft_d.synthesis(E)
Other Available Subpackages¶
pyroomacoustics.adaptive.data_structures
- this provides abstractions for computing functions on regular or irregular grids defined on circles and spheres with peak finding methods
pyroomacoustics.adaptive.util
- a few methods mainly to efficiently manipulate Toeplitz and Hankel matrices
Utilities¶
pyroomacoustics.adaptive.algorithms
- a dictionary containing all the adaptive filter object subclasses availables indexed by
keys
['RLS', 'BlockRLS', 'BlockLMS', 'NLMS', 'SubbandLMS']
Algorithms¶
Adaptive Filter (Base)¶
-
class
pyroomacoustics.adaptive.adaptive_filter.
AdaptiveFilter
(length)¶ Bases:
object
The dummy base class of an adaptive filter. This class doesn’t compute anything. It merely stores values in a buffer. It is used as a template for all other algorithms.
-
name
()¶
-
reset
()¶ Reset the state of the adaptive filter
-
update
(x_n, d_n)¶ Updates the adaptive filter with a new sample
Parameters: - x_n (float) – the new input sample
- d_n (float) – the new noisy reference signal
-
Least Mean Squares¶
Least Mean Squares Family¶
Implementations of adaptive filters from the LMS class. These algorithms have a low complexity and reliable behavior with a somewhat slower convergence.
-
class
pyroomacoustics.adaptive.lms.
BlockLMS
(length, mu=0.01, L=1, nlms=False)¶ Bases:
pyroomacoustics.adaptive.lms.NLMS
Implementation of the least mean squares algorithm (NLMS) in its block form
Parameters: - length (int) – the length of the filter
- mu (float, optional) – the step size (default 0.01)
- L (int, optional) – block size (default is 1)
- nlms (bool, optional) – whether or not to normalize as in NLMS (default is False)
-
reset
()¶ Reset the state of the adaptive filter
-
update
(x_n, d_n)¶ Updates the adaptive filter with a new sample
Parameters: - x_n (float) – the new input sample
- d_n (float) – the new noisy reference signal
-
class
pyroomacoustics.adaptive.lms.
NLMS
(length, mu=0.5)¶ Bases:
pyroomacoustics.adaptive.adaptive_filter.AdaptiveFilter
Implementation of the normalized least mean squares algorithm (NLMS)
Parameters: - length (int) – the length of the filter
- mu (float, optional) – the step size (default 0.5)
-
update
(x_n, d_n)¶ Updates the adaptive filter with a new sample
Parameters: - x_n (float) – the new input sample
- d_n (float) – the new noisy reference signal
Recursive Least Squares¶
Recursive Least Squares Family¶
Implementations of adaptive filters from the RLS class. These algorithms typically have a higher computational complexity, but a faster convergence.
-
class
pyroomacoustics.adaptive.rls.
BlockRLS
(length, lmbd=0.999, delta=10, dtype=<MagicMock id='140703898675184'>, L=None)¶ Bases:
pyroomacoustics.adaptive.rls.RLS
Block implementation of the recursive least-squares (RLS) algorithm. The difference with the vanilla implementation is that chunks of the input signals are processed in batch and some savings can be made there.
Parameters: - length (int) – the length of the filter
- lmbd (float, optional) – the exponential forgetting factor (default 0.999)
- delta (float, optional) – the regularization term (default 10)
- dtype (numpy type) – the bit depth of the numpy arrays to use (default np.float32)
- L (int, optional) – the block size (default to length)
-
reset
()¶ Reset the state of the adaptive filter
-
update
(x_n, d_n)¶ Updates the adaptive filter with a new sample
Parameters: - x_n (float) – the new input sample
- d_n (float) – the new noisy reference signal
-
class
pyroomacoustics.adaptive.rls.
RLS
(length, lmbd=0.999, delta=10, dtype=<MagicMock id='140703899817072'>)¶ Bases:
pyroomacoustics.adaptive.adaptive_filter.AdaptiveFilter
Implementation of the exponentially weighted Recursive Least Squares (RLS) adaptive filter algorithm.
Parameters: - length (int) – the length of the filter
- lmbd (float, optional) – the exponential forgetting factor (default 0.999)
- delta (float, optional) – the regularization term (default 10)
- dtype (numpy type) – the bit depth of the numpy arrays to use (default np.float32)
-
reset
()¶ Reset the state of the adaptive filter
-
update
(x_n, d_n)¶ Updates the adaptive filter with a new sample
Parameters: - x_n (float) – the new input sample
- d_n (float) – the new noisy reference signal
Subband LMS¶
-
class
pyroomacoustics.adaptive.subband_lms.
SubbandLMS
(num_taps, num_bands, mu=0.5, nlms=True)¶ Bases:
object
Frequency domain implementation of LMS. Adaptive filter for each subband.
Parameters: - num_taps (int) – length of the filter
- num_bands (int) – number of frequency bands, i.e. number of filters
- mu (float, optional) – step size for each subband (default 0.5)
- nlms (bool, optional) – whether or not to normalize as in NLMS (default is True)
-
reset
()¶
-
update
(X_n, D_n)¶ Updates the adaptive filters for each subband with the new block of input data.
Parameters: - X_n (numpy array, float) – new input signal (to unknown system) in frequency domain
- D_n (numpy array, float) – new noisy reference signal in frequency domain
-
pyroomacoustics.adaptive.subband_lms.
hermitian
(X)¶ Compute and return Hermitian transpose
Tools and Helpers¶
Data Structures¶
-
class
pyroomacoustics.adaptive.data_structures.
Buffer
(length=20, dtype=<MagicMock id='140703899844000'>)¶ Bases:
object
A simple buffer class with amortized cost
Parameters: - length (int) – buffer length
- dtype (numpy.type) – data type
-
flush
(n)¶ Removes the n oldest elements in the buffer
-
push
(val)¶ Add one element at the front of the buffer
-
size
()¶ Returns the number of elements in the buffer
-
top
(n)¶ Returns the n elements at the front of the buffer from newest to oldest
-
class
pyroomacoustics.adaptive.data_structures.
CoinFlipper
(p, length=10000)¶ Bases:
object
This class efficiently generates large number of coin flips. Because each call to
numpy.random.rand
is a little bit costly, it is more efficient to generate many values at once. This class does this and stores them in advance. It generates new fresh numbers when needed.Parameters: - p (float, 0 < p < 1) – probability to output a 1
- length (int) – the number of flips to precompute
-
flip
(n)¶ Get n random binary values from the buffer
-
flip_all
()¶ Regenerates all the used up values
-
fresh_flips
(n)¶ Generates n binary random values now
-
class
pyroomacoustics.adaptive.data_structures.
Powers
(a, length=20, dtype=<MagicMock id='140703898973856'>)¶ Bases:
object
This class allows to store all powers of a small number and get them ‘a la numpy’ with the bracket operator. There is automatic increase when new values are requested
Parameters: - a (float) – the number
- length (int) – the number of integer powers
- dtype (numpy.type, optional) – the data type (typically np.float32 or np.float64)
Example
>>> an = Powers(0.5) >>> print(an[4]) 0.0625
Utilities¶
-
pyroomacoustics.adaptive.util.
autocorr
(x)¶ Fast autocorrelation computation using the FFT
-
pyroomacoustics.adaptive.util.
hankel_multiplication
(c, r, A, mkl=True, **kwargs)¶ Compute numpy.dot(scipy.linalg.hankel(c,r=r), A) using the FFT.
Parameters: - c (ndarray) – the first column of the Hankel matrix
- r (ndarray) – the last row of the Hankel matrix
- A (ndarray) – the matrix to multiply on the right
- mkl (bool, optional) – if True, use the mkl_fft package if available
-
pyroomacoustics.adaptive.util.
hankel_stride_trick
(x, shape)¶ Make a Hankel matrix from a vector using stride tricks
Parameters: - x (ndarray) – a vector that contains the concatenation of the first column and first row of the Hankel matrix to build without repetition of the lower left corner value of the matrix
- shape (tuple) – the shape of the Hankel matrix to build, it must satisfy
x.shape[0] == shape[0] + shape[1] - 1
-
pyroomacoustics.adaptive.util.
mkl_toeplitz_multiplication
(c, r, A, A_padded=False, out=None, fft_len=None)¶ Compute numpy.dot(scipy.linalg.toeplitz(c,r), A) using the FFT from the mkl_fft package.
Parameters: - c (ndarray) – the first column of the Toeplitz matrix
- r (ndarray) – the first row of the Toeplitz matrix
- A (ndarray) – the matrix to multiply on the right
- A_padded (bool, optional) – the A matrix can be pre-padded with zeros by the user, if this is the case set to True
- out (ndarray, optional) – an ndarray to store the output of the multiplication
- fft_len (int, optional) – specify the length of the FFT to use
-
pyroomacoustics.adaptive.util.
naive_toeplitz_multiplication
(c, r, A)¶ Compute numpy.dot(scipy.linalg.toeplitz(c,r), A)
Parameters: - c (ndarray) – the first column of the Toeplitz matrix
- r (ndarray) – the first row of the Toeplitz matrix
- A (ndarray) – the matrix to multiply on the right
-
pyroomacoustics.adaptive.util.
toeplitz_multiplication
(c, r, A, **kwargs)¶ Compute numpy.dot(scipy.linalg.toeplitz(c,r), A) using the FFT.
Parameters: - c (ndarray) – the first column of the Toeplitz matrix
- r (ndarray) – the first row of the Toeplitz matrix
- A (ndarray) – the matrix to multiply on the right
-
pyroomacoustics.adaptive.util.
toeplitz_opt_circ_approx
(r, matrix=False)¶ Optimal circulant approximation of a symmetric Toeplitz matrix by Tony F. Chan
Parameters: - r (ndarray) – the first row of the symmetric Toeplitz matrix
- matrix (bool, optional) – if True, the full symetric Toeplitz matrix is returned, otherwise, only the first column
-
pyroomacoustics.adaptive.util.
toeplitz_strang_circ_approx
(r, matrix=False)¶ Circulant approximation to a symetric Toeplitz matrix by Gil Strang
Parameters: - r (ndarray) – the first row of the symmetric Toeplitz matrix
- matrix (bool, optional) – if True, the full symetric Toeplitz matrix is returned, otherwise, only the first column
Blind Source Separation¶
Module contents¶
Blind Source Separation¶
Implementations of a few blind source separation (BSS) algorithms.
- AuxIVA
- Independent Vector Analysis [1]
- Trinicon
- Time-domain BSS [2]
- ILRMA
- Independent Low-Rank Matrix Analysis [3]
- SparseAuxIVA
- Sparse Independent Vector Analysis [4]
- FastMNMF
- Fast Multichannel Nonnegative Matrix Factorization [5]
A few commonly used functions, such as projection back, can be found in
pyroomacoustics.bss.common
.
References
[1] | N. Ono, Stable and fast update rules for independent vector analysis based on auxiliary function technique, Proc. IEEE, WASPAA, pp. 189-192, Oct. 2011. |
[2] | R. Aichner, H. Buchner, F. Yan, and W. Kellermann A real-time blind source separation scheme and its application to reverberant and noisy acoustic environments, Signal Processing, 86(6), 1260-1277. doi:10.1016/j.sigpro.2005.06.022, 2006. |
[3] | D. Kitamura, N. Ono, H. Sawada, H. Kameoka, H. Saruwatari, Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization, IEEE/ACM Trans. ASLP, vol. 24, no. 9, pp. 1626-1641, Sept. 2016 |
[4] | J. Janský, Z. Koldovský, and N. Ono, A computationally cheaper method for blind speech separation based on AuxIVA and incomplete demixing transform, Proc. IEEE, IWAENC, pp. 1-5, Sept. 2016. |
[5] | K. Sekiguchi, A. A. Nugraha, Y. Bando, K. Yoshii, Fast Multichannel Source Separation Based on Jointly Diagonalizable Spatial Covariance Matrices, EUSIPCO, 2019. |
Algorithms¶
Independent Vector Analysis (AuxIVA)¶
AuxIVA¶
Blind Source Separation using independent vector analysis based on auxiliary function. This function will separate the input signal into statistically independent sources without using any prior information.
The algorithm in the determined case, i.e., when the number of sources is equal to the number of microphones, is AuxIVA [1]. When there are more microphones (the overdetermined case), a computationaly cheaper variant (OverIVA) is used [2].
Example
from scipy.io import wavfile
import pyroomacoustics as pra
# read multichannel wav file
# audio.shape == (nsamples, nchannels)
fs, audio = wavfile.read("my_multichannel_audio.wav")
# STFT analysis parameters
fft_size = 4096 # `fft_size / fs` should be ~RT60
hop == fft_size // 2 # half-overlap
win_a = pra.hann(fft_size) # analysis window
# optimal synthesis window
win_s = pra.transform.compute_synthesis_window(win_a, hop)
# STFT
# X.shape == (nframes, nfrequencies, nchannels)
X = pra.transform.analysis(audio, fft_size, hop, win=win_a)
# Separation
Y = pra.bss.auxiva(X, n_iter=20)
# iSTFT (introduces an offset of `hop` samples)
# y contains the time domain separated signals
# y.shape == (new_nsamples, nchannels)
y = pra.transform.synthesis(Y, fft_size, hop, win=win_s)
References
[1] | N. Ono, Stable and fast update rules for independent vector analysis based on auxiliary function technique, Proc. IEEE, WASPAA, pp. 189-192, Oct. 2011. |
[2] | R. Scheibler and N. Ono, Independent Vector Analysis with more Microphones than Sources, arXiv, 2019. https://arxiv.org/abs/1905.07880 |
-
pyroomacoustics.bss.auxiva.
auxiva
(X, n_src=None, n_iter=20, proj_back=True, W0=None, model='laplace', init_eig=False, return_filters=False, callback=None)¶ This is an implementation of AuxIVA/OverIVA that separates the input signal into statistically independent sources. The separation is done in the time-frequency domain and the FFT length should be approximately equal to the reverberation time.
Two different statistical models (Laplace or time-varying Gauss) can be used by using the keyword argument model. The performance of Gauss model is higher in good conditions (few sources, low noise), but Laplace (the default) is more robust in general.
Parameters: - X (ndarray (nframes, nfrequencies, nchannels)) – STFT representation of the signal
- n_src (int, optional) – The number of sources or independent components. When
n_src==nchannels
, the algorithms is identical to AuxIVA. Whenn_src==1
, then it is doing independent vector extraction. - n_iter (int, optional) – The number of iterations (default 20)
- proj_back (bool, optional) – Scaling on first mic by back projection (default True)
- W0 (ndarray (nfrequencies, nsrc, nchannels), optional) – Initial value for demixing matrix
- model (str) – The model of source distribution ‘gauss’ or ‘laplace’ (default)
- init_eig (bool, optional (default
False
)) – IfTrue
, and ifW0 is None
, then the weights are initialized using the principal eigenvectors of the covariance matrix of the input data. WhenFalse
, the demixing matrices are initialized with identity matrix. - return_filters (bool) – If true, the function will return the demixing matrix too
- callback (func) – A callback function called every 10 iterations, allows to monitor convergence
Returns: - Returns an (nframes, nfrequencies, nsources) array. Also returns
- the demixing matrix (nfrequencies, nchannels, nsources)
- if
return_values
keyword is True.
Trinicon¶
-
pyroomacoustics.bss.trinicon.
trinicon
(signals, w0=None, filter_length=2048, block_length=None, n_blocks=8, alpha_on=None, j_max=10, delta_max=0.0001, sigma2_0=1e-07, mu=0.001, lambd_a=0.2, return_filters=False)¶ Implementation of the TRINICON Blind Source Separation algorithm as described in
R. Aichner, H. Buchner, F. Yan, and W. Kellermann A real-time blind source separation scheme and its application to reverberant and noisy acoustic environments, Signal Processing, 86(6), 1260-1277. doi:10.1016/j.sigpro.2005.06.022, 2006. [pdf]
Specifically, adaptation of the pseudo-code from Table 1.
The implementation is hard-coded for 2 output channels.
Parameters: - signals (ndarray (nchannels, nsamples)) – The microphone input signals (time domain)
- w0 (ndarray (nchannels, nsources, nsamples), optional) – Optional initial value for the demixing filters
- filter_length (int, optional) – The length of the demixing filters, if w0 is provided, this option is ignored
- block_length (int, optional) – Block length (default 2x filter_length)
- n_blocks (int, optional) – Number of blocks processed at once (default 8)
- alpha_on (int, optional) – Online overlap factor (default
n_blocks
) - j_max (int, optional) – Number of offline iterations (default 10)
- delta_max (float, optional) – Regularization parameter, this sets the maximum value of the regularization term (default 1e-4)
- sigma2_0 (float, optional) – Regularization parameter, this sets the reference (machine?) noise level in the regularization (default 1e-7)
- mu (float, optional) – Offline update step size (default 0.001)
- lambd_a (float, optional) – Online forgetting factor (default 0.2)
- return_filters (bool) – If true, the function will return the demixing matrix too (default False)
Returns: Returns an (nsources, nsamples) array. Also returns the demixing matrix (nchannels, nsources, nsamples) if
return_filters
keyword is True.Return type: ndarray
Independent Low-Rank Matrix Analysis (ILRMA)¶
ILRMA¶
Blind Source Separation using Independent Low-Rank Matrix Analysis (ILRMA).
-
pyroomacoustics.bss.ilrma.
ilrma
(X, n_src=None, n_iter=20, proj_back=True, W0=None, n_components=2, return_filters=False, callback=None)¶ Implementation of ILRMA algorithm without partitioning function for BSS presented in
D. Kitamura, N. Ono, H. Sawada, H. Kameoka, H. Saruwatari, Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization, IEEE/ACM Trans. ASLP, vol. 24, no. 9, pp. 1626-1641, Sept. 2016
D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari Determined Blind Source Separation with Independent Low-Rank Matrix Analysis, in Audio Source Separation, S. Makino, Ed. Springer, 2018, pp. 125-156.
Parameters: - X (ndarray (nframes, nfrequencies, nchannels)) – STFT representation of the observed signal n_src: int, optional The number of sources or independent components
- n_iter (int, optional) – The number of iterations (default 20)
- proj_back (bool, optional) – Scaling on first mic by back projection (default True)
- W0 (ndarray (nfrequencies, nchannels, nchannels), optional) – Initial value for demixing matrix
- n_components (int) – Number of components in the non-negative spectrum
- return_filters (bool) – If true, the function will return the demixing matrix too
- callback (func) – A callback function called every 10 iterations, allows to monitor convergence
Returns: - Returns an (nframes, nfrequencies, nsources) array. Also returns
- the demixing matrix W (nfrequencies, nchannels, nsources)
- if
return_filters
keyword is True.
Sparse Independent Vector Analysis (SparseAuxIVA)¶
-
pyroomacoustics.bss.sparseauxiva.
sparseauxiva
(X, S=None, n_src=None, n_iter=20, proj_back=True, W0=None, model='laplace', return_filters=False, callback=None)¶ Implementation of sparse AuxIVA algorithm for BSS presented in
J. Janský, Z. Koldovský, and N. Ono, A computationally cheaper method for blind speech separation based on AuxIVA and incomplete demixing transform, Proc. IEEE, IWAENC, pp. 1-5, Sept. 2016.
Parameters: - X (ndarray (nframes, nfrequencies, nchannels)) – STFT representation of the signal
- n_src (int, optional) – The number of sources or independent components
- S (ndarray (k_freq)) – Indexes of active frequency bins for sparse AuxIVA
- n_iter (int, optional) – The number of iterations (default 20)
- proj_back (bool, optional) – Scaling on first mic by back projection (default True)
- W0 (ndarray (nfrequencies, nchannels, nchannels), optional) – Initial value for demixing matrix
- model (str) – The model of source distribution ‘gauss’ or ‘laplace’ (default)
- return_filters (bool) – If true, the function will return the demixing matrix too
- callback (func) – A callback function called every 10 iterations, allows to monitor convergence
Returns: - Returns an (nframes, nfrequencies, nsources) array. Also returns
- the demixing matrix (nfrequencies, nchannels, nsources)
- if
return_values
keyword is True.
Common Tools¶
-
pyroomacoustics.bss.common.
projection_back
(Y, ref, clip_up=None, clip_down=None)¶ This function computes the frequency-domain filter that minimizes the squared error to a reference signal. This is commonly used to solve the scale ambiguity in BSS.
The optimal filter z minimizes the squared error.
\[\min E[|z^* y - x|^2]\]It should thus satsify the orthogonality condition and can be derived as follows
\[ \begin{align}\begin{aligned}0 & = E[y^*\, (z^* y - x)]\\0 & = z^*\, E[|y|^2] - E[y^* x]\\z^* & = \frac{E[y^* x]}{E[|y|^2]}\\z & = \frac{E[y x^*]}{E[|y|^2]}\end{aligned}\end{align} \]In practice, the expectations are replaced by the sample mean.
Parameters: - Y (array_like (n_frames, n_bins, n_channels)) – The STFT data to project back on the reference signal
- ref (array_like (n_frames, n_bins)) – The reference signal
- clip_up (float, optional) – Limits the maximum value of the gain (default no limit)
- clip_down (float, optional) – Limits the minimum value of the gain (default no limit)
-
pyroomacoustics.bss.common.
sparir
(G, S, weights=<MagicMock name='mock()' id='140703898478352'>, gini=0, maxiter=50, tol=10, alpha=10, alphamax=100000.0, alphamin=1e-07)¶ Fast proximal algorithm implementation for sparse approximation of relative impulse responses from incomplete measurements of the corresponding relative transfer function based on
Z. Koldovsky, J. Malek, and S. Gannot, “Spatial Source Subtraction based on Incomplete Measurements of Relative Transfer Function”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, TASLP 2015.- The original Matlab implementation can be found at
- http://itakura.ite.tul.cz/zbynek/dwnld/SpaRIR.m
and it is referred in
Z. Koldovsky, F. Nesta, P. Tichavsky, and N. Ono, Frequency-domain blind speech separation using incomplete de-mixing transform, EUSIPCO 2016.
- G: ndarray (nfrequencies, 1)
- Frequency representation of the (incomplete) transfer function
- S: ndarray (kfrequencies)
- Indexes of active frequency bins for sparse AuxIVA
- weights: ndarray (kfrequencies) or int, optional
- The higher the value of weights(i), the higher the probability that g(i) is zero; if scalar, all weights are the same; if empty, default value is used
- gini: ndarray (nfrequencies)
- Initialization for the computation of g
- maxiter: int
- Maximum number of iterations before achieving convergence (default 50)
- tol: float
- Minimum convergence criteria based on the gradient difference between adjacent updates (default 10)
- alpha: float
- Inverse of the decreasing speed of the gradient at each iteration. This parameter is updated at every iteration (default 10)
- alphamax: float
- Upper bound for alpha (default 1e5)
- alphamin: float
- Lower bound for alpha (default 1e-7)
Returns the sparse approximation of the impulse response in the time-domain (real-valued) as an (nfrequencies) array.
Fast Multichannel Nonnegative Matrix Factorization (FastMNMF)¶
FastMNMF¶
Blind Source Separation based on Fast Multichannel Nonnegative Matrix Factorization (FastMNMF)
-
pyroomacoustics.bss.fastmnmf.
fastmnmf
(X, n_src=None, n_iter=30, W0=None, n_components=4, callback=None, mic_index=0, interval_update_Q=1, interval_normalize=10, initialize_ilrma=False)¶ Implementation of FastMNMF algorithm presented in
K. Sekiguchi, A. A. Nugraha, Y. Bando, K. Yoshii, Fast Multichannel Source Separation Based on Jointly Diagonalizable Spatial Covariance Matrices, EUSIPCO, 2019. [arXiv]
The code of FastMNMF with GPU support and FastMNMF-DP which integrates DNN-based source model into FastMNMF is available on https://github.com/sekiguchi92/SpeechEnhancement
Parameters: - X (ndarray (nframes, nfrequencies, nchannels)) – STFT representation of the observed signal
- n_src (int, optional) – The number of sound sources (if n_src = None, n_src is set to the number of microphone)
- n_iter (int, optional) – The number of iterations
- W0 (ndarray (nfrequencies, nchannels, nchannels), optional) – Initial value for diagonalizer Q Demixing matrix can be used as the initial value
- n_components (int, optional) – Number of components in the non-negative spectrum
- callback (func, optional) – A callback function called every 10 iterations, allows to monitor convergence
- mic_index (int, optional) – The index of microphone of which you want to get the source image
- interval_update_Q (int, optional) – The interval of updating Q
- interval_normalize (int, optional) – The interval of parameter normalization
- initialize_ilrma (boolean, optional) – Initialize diagonalizer Q by using ILRMA
Returns: separated spectrogram – An (nframes, nfrequencies, nsources) array.
Return type: numpy.ndarray
Direction of Arrival¶
Module contents¶
Direction of Arrival Finding¶
This sub-package provides implementations of popular direction of arrival findings algorithms.
- MUSIC
- Multiple Signal Classification [1]
- SRP-PHAT
- Steered Response Power – Phase Transform [2]
- CSSM
- Coherent Signal Subspace Method [3]
- WAVES
- Weighted Average of Signal Subspaces [4]
- TOPS
- Test of Orthogonality of Projected Subspaces [5]
- FRIDA
- Finite Rate of Innovation Direction of Arrival [6]
All these classes derive from the abstract base class
pyroomacoustics.doa.doa.DOA
that offers generic methods for finding
and visualizing the locations of acoustic sources.
The constructor can be called once to build the DOA finding object. Then, the
method pyroomacoustics.doa.doa.DOA.locate_sources
performs DOA
finding based on time-frequency passed to it as an argument. Extra arguments
can be supplied to indicate which frequency bands should be used for
localization.
How to use the DOA module¶
Here R
is a 2xQ ndarray that contains the locations of the Q microphones
in the columns, fs
is the sampling frequency of the input signal, and
nfft
the length of the FFT used.
The STFT snapshots are passed to the localization methods in the X ndarray of
shape Q x (nfft // 2 + 1) x n_snapshots
, where n_snapshots
is the
number of STFT frames to use for the localization. The option freq_bins
can be provided to specify which frequency bins to use for the localization.
>>> doa = pyroomacoustics.doa.MUSIC(R, fs, nfft)
>>> doa.locate_sources(X, freq_bins=np.arange(20, 40))
Other Available Subpackages¶
pyroomacoustics.doa.grid
- this provides abstractions for computing functions on regular or irregular grids defined on circles and spheres with peak finding methods
pyroomacoustics.doa.plotters
- a few methods to plot functions and points on circles or spheres
pyroomacoustics.doa.detect_peaks
- 1D peak detection routine from Marcos Duarte
pyroomacoustics.doa.tools_frid_doa_plane
- routines implementing FRIDA algorithm
Utilities¶
pyroomacoustics.doa.algorithms
- a dictionary containing all the DOA object subclasses availables indexed by
keys
['MUSIC', 'SRP', 'CSSM', 'WAVES', 'TOPS', 'FRIDA']
References
[1] | R. Schmidt, Multiple emitter location and signal parameter estimation, IEEE Trans. Antennas Propag., Vol. 34, Num. 3, pp 276–280, 1986 |
[2] | J. H. DiBiase, A high-accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays, PHD Thesis, Brown University, 2000 |
[3] | H. Wang, M. Kaveh, Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources, IEEE Trans. Acoust., Speech, Signal Process., Vol. 33, Num. 4, pp 823–831, 1985 |
[4] | E. D. di Claudio, R. Parisi, WAVES: Weighted average of signal subspaces for robust wideband direction finding, IEEE Trans. Signal Process., Vol. 49, Num. 10, 2179–2191, 2001 |
[5] | Y. Yeo-Sun, L. M. Kaplan, J. H. McClellan, TOPS: New DOA estimator for wideband signals, IEEE Trans. Signal Process., Vol. 54, Num 6., pp 1977–1989, 2006 |
[6] | H. Pan, R. Scheibler, E. Bezzam, I. Dokmanic, and M. Vetterli, FRIDA: FRI-based DOA estimation for arbitrary array layouts, Proc. ICASSP, pp 3186-3190, 2017 |
Algorithms¶
CSSM¶
-
class
pyroomacoustics.doa.cssm.
CSSM
(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, num_iter=5, **kwargs)¶ Bases:
pyroomacoustics.doa.music.MUSIC
Class to apply the Coherent Signal-Subspace method [CSSM] for Direction of Arrival (DoA) estimation.
Note
Run locate_sources() to apply the CSSM algorithm.
Parameters: - L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
- fs (float) – Sampling frequency.
- nfft (int) – FFT length.
- c (float) – Speed of sound. Default: 343 m/s
- num_src (int) – Number of sources to detect. Default: 1
- mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
- r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
- azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
- colatitude (numpy array) – Candidate colatitude angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
- num_iter (int) – Number of iterations for CSSM. Default: 5
References
[CSSM] H. Wang, M. Kaveh, Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources, IEEE Trans. Acoust., Speech, Signal Process., Vol. 33, Num. 4, pp 823–831, 1985
FRIDA¶
-
class
pyroomacoustics.doa.frida.
FRIDA
(L, fs, nfft, max_four=None, c=343.0, num_src=1, G_iter=None, max_ini=5, n_rot=1, max_iter=50, noise_level=1e-10, low_rank_cleaning=False, stopping='max_iter', stft_noise_floor=0.0, stft_noise_margin=1.5, signal_type='visibility', use_lu=True, verbose=False, symb=True, use_cache=False, **kwargs)¶ Bases:
pyroomacoustics.doa.doa.DOA
Implements the FRI-based direction of arrival finding algorithm [FRIDA].
Note
Run locate_sources() to apply the CSSM algorithm.
Parameters: - L (ndarray) – Contains the locations of the microphones in the columns
- fs (int or float) – Sampling frequency
- nfft (int) – FFT size
- max_four (int) – Maximum order of the Fourier or spherical harmonics expansion
- c (float, optional) – Speed of sound
- num_src (int, optional) – The number of sources to recover (default 1)
- G_iter (int) – Number of mapping matrix refinement iterations in recovery algorithm (default 1)
- max_ini (int, optional) – Number of random initializations to use in recovery algorithm (default 5)
- n_rot (int, optional) – Number of random rotations to apply before recovery algorithm (default 10)
- noise_level (float, optional) – Noise level in the visibility measurements, if available (default 1e-10)
- stopping (str, optional) – Stopping criteria for the recovery algorithm. Can be max iterations or noise level (default max_iter)
- stft_noise_floor (float) – The noise floor in the STFT measurements, if available (default 0)
- stft_noise_margin (float) – When this, along with stft_noise_floor is set, we only pick frames with at least stft_noise_floor * stft_noise_margin power
- signal_type (str) –
Which type of measurements to use:
- ’visibility’: Cross correlation measurements
- ’raw’: Microphone signals
- use_lu (bool, optional) – Whether to use LU decomposition for efficiency
- verbose (bool, optional) – Whether to output intermediate result for debugging purposes
- symb (bool, optional) – Whether enforce the symmetry on the reconstructed uniform samples of sinusoids b
References
[FRIDA] H. Pan, R. Scheibler, E. Bezzam, I. Dokmanic, and M. Vetterli, FRIDA: FRI-based DOA estimation for arbitrary array layouts, Proc. ICASSP, pp 3186-3190, 2017
MUSIC¶
-
class
pyroomacoustics.doa.music.
MUSIC
(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, **kwargs)¶ Bases:
pyroomacoustics.doa.doa.DOA
Class to apply MUltiple SIgnal Classication (MUSIC) direction-of-arrival (DoA) for a particular microphone array.
Note
Run locate_source() to apply the MUSIC algorithm.
Parameters: - L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
- fs (float) – Sampling frequency.
- nfft (int) – FFT length.
- c (float) – Speed of sound. Default: 343 m/s
- num_src (int) – Number of sources to detect. Default: 1
- mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
- r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
- azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
- colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
-
plot_individual_spectrum
()¶ Plot the steered response for each frequency.
SRP-PHAT¶
-
class
pyroomacoustics.doa.srp.
SRP
(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, **kwargs)¶ Bases:
pyroomacoustics.doa.doa.DOA
Class to apply Steered Response Power (SRP) direction-of-arrival (DoA) for a particular microphone array.
Note
Run locate_source() to apply the SRP-PHAT algorithm.
Parameters: - L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
- fs (float) – Sampling frequency.
- nfft (int) – FFT length.
- c (float) – Speed of sound. Default: 343 m/s
- num_src (int) – Number of sources to detect. Default: 1
- mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
- r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
- azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
- colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
TOPS¶
-
class
pyroomacoustics.doa.tops.
TOPS
(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, **kwargs)¶ Bases:
pyroomacoustics.doa.music.MUSIC
Class to apply Test of Orthogonality of Projected Subspaces [TOPS] for Direction of Arrival (DoA) estimation.
Note
Run locate_source() to apply the TOPS algorithm.
Parameters: - L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
- fs (float) – Sampling frequency.
- nfft (int) – FFT length.
- c (float) – Speed of sound. Default: 343 m/s
- num_src (int) – Number of sources to detect. Default: 1
- mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
- r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
- azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
- colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
References
[TOPS] Y. Yeo-Sun, L. M. Kaplan, J. H. McClellan, TOPS: New DOA estimator for wideband signals, IEEE Trans. Signal Process., Vol. 54, Num 6., pp 1977–1989, 2006
WAVES¶
-
class
pyroomacoustics.doa.waves.
WAVES
(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, num_iter=5, **kwargs)¶ Bases:
pyroomacoustics.doa.music.MUSIC
Class to apply Weighted Average of Signal Subspaces [WAVES] for Direction of Arrival (DoA) estimation.
Note
Run locate_sources() to apply the WAVES algorithm.
Parameters: - L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
- fs (float) – Sampling frequency.
- nfft (int) – FFT length.
- c (float) – Speed of sound. Default: 343 m/s
- num_src (int) – Number of sources to detect. Default: 1
- mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
- r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
- azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
- colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
- num_iter (int) – Number of iterations for CSSM. Default: 5
References
[WAVES] E. D. di Claudio, R. Parisi, WAVES: Weighted average of signal subspaces for robust wideband direction finding, IEEE Trans. Signal Process., Vol. 49, Num. 10, 2179–2191, 2001
Tools and Helpers¶
DOA (Base)¶
-
class
pyroomacoustics.doa.doa.
DOA
(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, n_grid=None, dim=2, *args, **kwargs)¶ Bases:
object
Abstract parent class for Direction of Arrival (DoA) algorithms. After creating an object (SRP, MUSIC, CSSM, WAVES, or TOPS), run locate_source to apply the corresponding algorithm.
Parameters: - L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
- fs (float) – Sampling frequency.
- nfft (int) – FFT length.
- c (float) – Speed of sound. Default: 343 m/s
- num_src (int) – Number of sources to detect. Default: 1
- mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
- r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
- azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
- colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
- n_grid (int) – If azimuth and colatitude are not specified, we will create a grid with so many points. Default is 360.
- dim (int) – The dimension of the problem. Set dim=2 to find sources on the circle (x-y plane). Set dim=3 to search on the whole sphere.
-
locate_sources
(X, num_src=None, freq_range=[500.0, 4000.0], freq_bins=None, freq_hz=None)¶ Locate source(s) using corresponding algorithm.
Parameters: - X (numpy array) – Set of signals in the frequency (RFFT) domain for current frame. Size should be M x F x S, where M should correspond to the number of microphones, F to nfft/2+1, and S to the number of snapshots (user-defined). It is recommended to have S >> M.
- num_src (int) – Number of sources to detect. Default is value given to object constructor.
- freq_range (list of floats, length 2) – Frequency range on which to run DoA: [fmin, fmax].
- freq_bins (list of int) – freq_bins: List of individual frequency bins on which to run DoA. If defined by user, it will not take into consideration freq_range or freq_hz.
- freq_hz (list of floats) – List of individual frequencies on which to run DoA. If defined by user, it will not take into consideration freq_range.
-
polar_plt_dirac
(azimuth_ref=None, alpha_ref=None, save_fig=False, file_name=None, plt_dirty_img=True)¶ Generate polar plot of DoA results.
Parameters: - azimuth_ref (numpy array) – True direction of sources (in radians).
- alpha_ref (numpy array) – Estimated amplitude of sources.
- save_fig (bool) – Whether or not to save figure as pdf.
- file_name (str) – Name of file (if saved). Default is ‘polar_recon_dirac.pdf’
- plt_dirty_img (bool) – Whether or not to plot spatial spectrum or ‘dirty image’ in the case of FRI.
-
class
pyroomacoustics.doa.doa.
ModeVector
(L, fs, nfft, c, grid, mode='far', precompute=False)¶ Bases:
object
This is a class for look-up tables of mode vectors. This look-up table is an outer product of three vectors running along candidate locations, time, and frequency. When the grid becomes large, the look-up table might be too large to store in memory. In that case, this class allows to only compute the outer product elements when needed, only keeping the three vectors in memory. When the table is small, a precompute option can be set to True to compute the whole table in advance.
Tools for FRIDA (azimuth only)¶
-
pyroomacoustics.doa.tools_fri_doa_plane.
Rmtx_ri
(coef_ri, K, D, L)¶ Split T matrix in rea/imaginary representation
-
pyroomacoustics.doa.tools_fri_doa_plane.
Rmtx_ri_half
(coef_half, K, D, L, D_coef)¶ Split T matrix in rea/imaginary conjugate symmetric representation
-
pyroomacoustics.doa.tools_fri_doa_plane.
Rmtx_ri_half_out_half
(coef_half, K, D, L, D_coef, mtx_shrink)¶ if both b and annihilation filter coefficients are Hermitian symmetric, then the output will also be Hermitian symmetric => the effectively output is half the size
-
pyroomacoustics.doa.tools_fri_doa_plane.
Tmtx_ri
(b_ri, K, D, L)¶ build convolution matrix associated with b_ri
Parameters: - b_ri – a real-valued vector
- K – number of Diracs
- D1 – expansion matrix for the real-part
- D2 – expansion matrix for the imaginary-part
-
pyroomacoustics.doa.tools_fri_doa_plane.
Tmtx_ri_half
(b_ri, K, D, L, D_coef)¶ Split T matrix in conjugate symmetric representation
-
pyroomacoustics.doa.tools_fri_doa_plane.
Tmtx_ri_half_out_half
(b_ri, K, D, L, D_coef, mtx_shrink)¶ if both b and annihilation filter coefficients are Hermitian symmetric, then the output will also be Hermitian symmetric => the effectively output is half the size
-
pyroomacoustics.doa.tools_fri_doa_plane.
build_mtx_amp
(phi_k, p_mic_x, p_mic_y)¶ the matrix that maps Diracs’ amplitudes to the visibility
Parameters: - phi_k – Diracs’ location (azimuth)
- p_mic_x – a vector that contains microphones’ x-coordinates
- p_mic_y – a vector that contains microphones’ y-coordinates
-
pyroomacoustics.doa.tools_fri_doa_plane.
build_mtx_amp_ri
(p_mic_x, p_mic_y, phi_k)¶ builds real/imaginary amplitude matrix
-
pyroomacoustics.doa.tools_fri_doa_plane.
build_mtx_raw_amp
(p_mic_x, p_mic_y, phi_k)¶ the matrix that maps Diracs’ amplitudes to the visibility
Parameters: - phi_k – Diracs’ location (azimuth)
- p_mic_x – a vector that contains microphones’ x-coordinates
- p_mic_y – a vector that contains microphones’ y-coordinates
-
pyroomacoustics.doa.tools_fri_doa_plane.
coef_expan_mtx
(K)¶ expansion matrix for an annihilating filter of size K + 1
Parameters: K – number of Dirac. The filter size is K + 1
-
pyroomacoustics.doa.tools_fri_doa_plane.
compute_b
(G_lst, GtG_lst, beta_lst, Rc0, num_bands, a_ri, use_lu=False, GtG_inv_lst=None)¶ compute the uniform sinusoidal samples b from the updated annihilating filter coeffiients.
Parameters: - GtG_lst – list of G^H G for different subbands
- beta_lst – list of beta-s for different subbands
- Rc0 – right-dual matrix, here it is the convolution matrix associated with c
- num_bands – number of bands
- a_ri – a 2D numpy array. each column corresponds to the measurements within a subband
-
pyroomacoustics.doa.tools_fri_doa_plane.
compute_mtx_obj
(GtG_lst, Tbeta_lst, Rc0, num_bands, K)¶ - compute the matrix (M) in the objective function:
- min c^H M c s.t. c0^H c = 1
Parameters: - GtG_lst – list of G^H * G
- Tbeta_lst – list of Teoplitz matrices for beta-s
- Rc0 – right dual matrix for the annihilating filter (same of each block -> not a list)
-
pyroomacoustics.doa.tools_fri_doa_plane.
compute_obj_val
(GtG_inv_lst, Tbeta_lst, Rc0, c_ri_half, num_bands, K)¶ compute the fitting error. CAUTION: Here we assume use_lu = True
-
pyroomacoustics.doa.tools_fri_doa_plane.
cov_mtx_est
(y_mic)¶ estimate covariance matrix
Parameters: y_mic – received signal (complex based band representation) at microphones
-
pyroomacoustics.doa.tools_fri_doa_plane.
cpx_mtx2real
(mtx)¶ extend complex valued matrix to an extended matrix of real values only
Parameters: mtx – input complex valued matrix
-
pyroomacoustics.doa.tools_fri_doa_plane.
dirac_recon_ri
(G, a_ri, K, M, noise_level, max_ini=100, stop_cri='mse')¶ Reconstruct point sources’ locations (azimuth) from the visibility measurements
Parameters: - G – the linear transformation matrix that links the visibilities to uniformly sampled sinusoids
- a_ri – the visibility measurements
- K – number of Diracs
- M – the Fourier series expansion is between -M and M
- noise_level – level of noise (ell_2 norm) in the measurements
- max_ini – maximum number of initialisations
- stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
-
pyroomacoustics.doa.tools_fri_doa_plane.
dirac_recon_ri_half
(G, a_ri, K, M, noise_level, max_ini=100, stop_cri='mse')¶ Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged.
Parameters: - G (param) – the linear transformation matrix that links the visibilities to uniformly sampled sinusoids
- a_ri – the visibility measurements
- K – number of Diracs
- M – the Fourier series expansion is between -M and M
- noise_level – level of noise (ell_2 norm) in the measurements
- max_ini – maximum number of initialisations
- stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
-
pyroomacoustics.doa.tools_fri_doa_plane.
dirac_recon_ri_half_multiband
(G_lst, a_ri, K, M, max_ini=100)¶ Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged.
Parameters: - G_lst – a list of the linear transformation matrices that links the visibilities to uniformly sampled sinusoids
- a_ri – the visibility measurements
- K – number of Diracs
- M – the Fourier series expansion is between -M and M
- noise_level – level of noise (ell_2 norm) in the measurements
- max_ini – maximum number of initialisations
- stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
-
pyroomacoustics.doa.tools_fri_doa_plane.
dirac_recon_ri_half_multiband_lu
(G_lst, GtG_lst, GtG_inv_lst, a_ri, K, M, max_ini=100, max_iter=50)¶ Here we use LU decomposition to precompute a few entries. Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged.
Parameters: - G_lst – a list of the linear transformation matrices that links the visibilities to uniformly sampled sinusoids
- a_ri – the visibility measurements
- K – number of Diracs
- M – the Fourier series expansion is between -M and M
- noise_level – level of noise (ell_2 norm) in the measurements
- max_ini – maximum number of initialisations
- stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
-
pyroomacoustics.doa.tools_fri_doa_plane.
dirac_recon_ri_half_multiband_parallel
(G, a_ri, K, M, max_ini=100)¶ Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged. We use parallel implementation when stop_cri == ‘max_iter’
Parameters: - G – the linear transformation matrix that links the visibilities to uniformly sampled sinusoids
- a_ri – the visibility measurements
- K – number of Diracs
- M – the Fourier series expansion is between -M and M
- noise_level – level of noise (ell_2 norm) in the measurements
- max_ini – maximum number of initialisations
- stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
-
pyroomacoustics.doa.tools_fri_doa_plane.
dirac_recon_ri_half_parallel
(G, a_ri, K, M, max_ini=100)¶ Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged. We use parallel implementation when stop_cri == ‘max_iter’
Parameters: - G – the linear transformation matrix that links the visibilities to uniformly sampled sinusoids
- a_ri – the visibility measurements
- K – number of Diracs
- M – the Fourier series expansion is between -M and M
- noise_level – level of noise (ell_2 norm) in the measurements
- max_ini – maximum number of initialisations
- stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
-
pyroomacoustics.doa.tools_fri_doa_plane.
dirac_recon_ri_inner
(c_ri_half, a_ri, rhs, rhs_bl, K, M, D1, D2, D_coef, mtx_shrink, Tbeta_ri, G, max_iter)¶ inner loop of the dirac_recon_ri_half_parallel function
-
pyroomacoustics.doa.tools_fri_doa_plane.
dirac_recon_ri_multiband_inner
(c_ri_half, a_ri, num_bands, rhs, rhs_bl, K, M, D1, D2, D_coef, mtx_shrink, Tbeta_ri, G, GtG, max_iter)¶ Inner loop of the dirac_recon_ri_multiband function
-
pyroomacoustics.doa.tools_fri_doa_plane.
extract_off_diag
(mtx)¶ extract off diagonal entries in mtx. The output vector is order in a column major manner.
Parameters: mtx – input matrix to extract the off diagonal entries
-
pyroomacoustics.doa.tools_fri_doa_plane.
hermitian_expan
(half_vec_len)¶ expand a real-valued vector to a Hermitian symmetric vector. The input vector is a concatenation of the real parts with NON-POSITIVE indices and the imaginary parts with STRICTLY-NEGATIVE indices.
Parameters: half_vec_len – length of the first half vector
-
pyroomacoustics.doa.tools_fri_doa_plane.
lu_compute_mtx_obj
(Tbeta_lst, num_bands, K, lu_R_GtGinv_Rt_lst)¶ - compute the matrix (M) in the objective function:
- min c^H M c s.t. c0^H c = 1
Parameters: - GtG_lst – list of G^H * G
- Tbeta_lst – list of Teoplitz matrices for beta-s
- Rc0 – right dual matrix for the annihilating filter (same of each block -> not a list)
-
pyroomacoustics.doa.tools_fri_doa_plane.
lu_compute_mtx_obj_initial
(GtG_inv_lst, Tbeta_lst, Rc0, num_bands, K)¶ - compute the matrix (M) in the objective function:
- min c^H M c s.t. c0^H c = 1
Parameters: - GtG_lst – list of G^H * G
- Tbeta_lst – list of Teoplitz matrices for beta-s
- Rc0 – right dual matrix for the annihilating filter (same of each block -> not a list)
-
pyroomacoustics.doa.tools_fri_doa_plane.
make_G
(p_mic_x, p_mic_y, omega_bands, sound_speed, M, signal_type='visibility')¶ reconstruct point sources on the circle from the visibility measurements from multi-bands.
Parameters: - p_mic_x – a vector that contains microphones’ x-coordinates
- p_mic_y – a vector that contains microphones’ y-coordinates
- omega_bands – mid-band (ANGULAR) frequencies [radian/sec]
- sound_speed – speed of sound
- signal_type – The type of the signal a, possible values are ‘visibility’ for covariance matrix and ‘raw’ for microphone inputs
Returns: Return type: The list of mapping matrices from measurements to sinusoids
-
pyroomacoustics.doa.tools_fri_doa_plane.
make_GtG_and_inv
(G_lst)¶
-
pyroomacoustics.doa.tools_fri_doa_plane.
mtx_freq2raw
(M, p_mic_x, p_mic_y)¶ build the matrix that maps the Fourier series to the raw microphone signals
Parameters: - M – the Fourier series expansion is limited from -M to M
- p_mic_x – a vector that contains microphones x coordinates
- p_mic_y – a vector that contains microphones y coordinates
-
pyroomacoustics.doa.tools_fri_doa_plane.
mtx_freq2visi
(M, p_mic_x, p_mic_y)¶ build the matrix that maps the Fourier series to the visibility
Parameters: - M – the Fourier series expansion is limited from -M to M
- p_mic_x – a vector that constains microphones x coordinates
- p_mic_y – a vector that constains microphones y coordinates
-
pyroomacoustics.doa.tools_fri_doa_plane.
mtx_fri2signal_ri
(M, p_mic_x, p_mic_y, D1, D2, signal='visibility')¶ build the matrix that maps the Fourier series to the visibility in terms of REAL-VALUED entries only. (matrix size double)
Parameters: - M – the Fourier series expansion is limited from -M to M
- p_mic_x – a vector that contains microphones x coordinates
- p_mic_y – a vector that contains microphones y coordinates
- D1 – expansion matrix for the real-part
- D2 – expansion matrix for the imaginary-part
- signal – The type of signal considered (‘visibility’ for covariance matrix, ‘raw’ for microphone inputs)
-
pyroomacoustics.doa.tools_fri_doa_plane.
mtx_fri2signal_ri_multiband
(M, p_mic_x_all, p_mic_y_all, D1, D2, aslist=False, signal='visibility')¶ build the matrix that maps the Fourier series to the visibility in terms of REAL-VALUED entries only. (matrix size double)
Parameters: - M – the Fourier series expansion is limited from -M to M
- p_mic_x_all – a matrix that contains microphones x coordinates
- p_mic_y_all – a matrix that contains microphones y coordinates
- D1 – expansion matrix for the real-part
- D2 – expansion matrix for the imaginary-part aslist: whether the linear mapping for each subband is returned as a list or a block diagonal matrix
- signal – The type of signal considered (‘visibility’ for covariance matrix, ‘raw’ for microphone inputs)
-
pyroomacoustics.doa.tools_fri_doa_plane.
mtx_updated_G
(phi_recon, M, mtx_amp2visi_ri, mtx_fri2visi_ri)¶ Update the linear transformation matrix that links the FRI sequence to the visibilities by using the reconstructed Dirac locations. :param phi_recon: the reconstructed Dirac locations (azimuths) :param M: the Fourier series expansion is between -M to M :param p_mic_x: a vector that contains microphones’ x-coordinates :param p_mic_y: a vector that contains microphones’ y-coordinates :param mtx_freq2visi: the linear mapping from Fourier series to visibilities
-
pyroomacoustics.doa.tools_fri_doa_plane.
mtx_updated_G_multiband
(phi_recon, M, mtx_amp2visi_ri, mtx_fri2visi_ri, num_bands)¶ Update the linear transformation matrix that links the FRI sequence to the visibilities by using the reconstructed Dirac locations.
Parameters: - phi_recon – the reconstructed Dirac locations (azimuths)
- M – the Fourier series expansion is between -M to M
- p_mic_x – a vector that contains microphones’ x-coordinates
- p_mic_y – a vector that contains microphones’ y-coordinates
- mtx_fri2visi – the linear mapping from Fourier series to visibilities
-
pyroomacoustics.doa.tools_fri_doa_plane.
mtx_updated_G_multiband_new
(phi_opt, M, p_x, p_y, G0_lst, num_bands)¶ Update the linear transformation matrix that links the FRI sequence to the visibilities by using the reconstructed Dirac locations.
Parameters: - phi_opt – the reconstructed Dirac locations (azimuths)
- M – the Fourier series expansion is between -M to M
- p_mic_x – a vector that contains microphones’ x-coordinates
- p_mic_y – a vector that contains microphones’ y-coordinates
- G0_lst – the original linear mapping from Fourier series to visibilities
- num_bands – number of subbands
-
pyroomacoustics.doa.tools_fri_doa_plane.
multiband_cov_mtx_est
(y_mic)¶ estimate covariance matrix based on the received signals at microphones
Parameters: y_mic – received signal (complex base-band representation) at microphones
-
pyroomacoustics.doa.tools_fri_doa_plane.
multiband_extract_off_diag
(mtx)¶ extract off-diagonal entries in mtx The output vector is order in a column major manner
Parameters: mtx (input matrix to extract the off-diagonal entries) –
-
pyroomacoustics.doa.tools_fri_doa_plane.
output_shrink
(K, L)¶ shrink the convolution output to half the size. used when both the annihilating filter and the uniform samples of sinusoids satisfy Hermitian symmetric.
Parameters: - K – the annihilating filter size: K + 1
- L – length of the (complex-valued) b vector
-
pyroomacoustics.doa.tools_fri_doa_plane.
polar2cart
(rho, phi)¶ convert from polar to cartesian coordinates
Parameters: - rho – radius
- phi – azimuth
-
pyroomacoustics.doa.tools_fri_doa_plane.
pt_src_recon
(a, p_mic_x, p_mic_y, omega_band, sound_speed, K, M, noise_level, max_ini=50, stop_cri='mse', update_G=False, verbose=False, signal_type='visibility', **kwargs)¶ reconstruct point sources on the circle from the visibility measurements
Parameters: - a – the measured visibilities
- p_mic_x – a vector that contains microphones’ x-coordinates
- p_mic_y – a vector that contains microphones’ y-coordinates
- omega_band – mid-band (ANGULAR) frequency [radian/sec]
- sound_speed – speed of sound
- K – number of point sources
- M – the Fourier series expansion is between -M to M
- noise_level – noise level in the measured visibilities
- max_ini – maximum number of random initialisation used
- stop_cri – either ‘mse’ or ‘max_iter’
- update_G – update the linear mapping that links the uniformly sampled sinusoids to the visibility or not.
- verbose – whether output intermediate results for debugging or not
- signal_type – The type of the signal a, possible values are ‘visibility’ for covariance matrix and ‘raw’ for microphone inputs
- kwargs (possible optional input: G_iter: number of iterations for the G updates) –
-
pyroomacoustics.doa.tools_fri_doa_plane.
pt_src_recon_multiband
(a, p_mic_x, p_mic_y, omega_bands, sound_speed, K, M, noise_level, max_ini=50, update_G=False, verbose=False, signal_type='visibility', max_iter=50, G_lst=None, GtG_lst=None, GtG_inv_lst=None, **kwargs)¶ reconstruct point sources on the circle from the visibility measurements from multi-bands.
Parameters: - a – the measured visibilities in a matrix form, where the second dimension corresponds to different subbands
- p_mic_x – a vector that contains microphones’ x-coordinates
- p_mic_y – a vector that contains microphones’ y-coordinates
- omega_bands – mid-band (ANGULAR) frequencies [radian/sec]
- sound_speed – speed of sound
- K – number of point sources
- M – the Fourier series expansion is between -M to M
- noise_level – noise level in the measured visibilities
- max_ini – maximum number of random initialisation used
- update_G – update the linear mapping that links the uniformly sampled sinusoids to the visibility or not.
- verbose – whether output intermediate results for debugging or not
- signal_type – The type of the signal a, possible values are ‘visibility’ for covariance matrix and ‘raw’ for microphone inputs
- kwargs – possible optional input: G_iter: number of iterations for the G updates
-
pyroomacoustics.doa.tools_fri_doa_plane.
pt_src_recon_rotate
(a, p_mic_x, p_mic_y, K, M, noise_level, max_ini=50, stop_cri='mse', update_G=False, num_rotation=1, verbose=False, signal_type='visibility', **kwargs)¶ reconstruct point sources on the circle from the visibility measurements. Here we apply random rotations to the coordiantes.
Parameters: - a – the measured visibilities
- p_mic_x – a vector that contains microphones’ x-coordinates
- p_mic_y – a vector that contains microphones’ y-coordinates
- K – number of point sources
- M – the Fourier series expansion is between -M to M
- noise_level – noise level in the measured visibilities
- max_ini – maximum number of random initialisation used
- stop_cri – either ‘mse’ or ‘max_iter’
- update_G – update the linear mapping that links the uniformly sampled sinusoids to the visibility or not.
- num_rotation – number of random rotations
- verbose – whether output intermediate results for debugging or not
- signal_type – The type of the signal a, possible values are ‘visibility’ for covariance matrix and ‘raw’ for microphone inputs
- kwargs – possible optional input: G_iter: number of iterations for the G updates
Grid Objects¶
Routines to perform grid search on the sphere
-
class
pyroomacoustics.doa.grid.
Grid
(n_points)¶ Bases:
object
This is an abstract class with attributes and methods for grids
Parameters: n_points (int) – the number of points on the grid -
apply
(func, spherical=False)¶
-
find_peaks
(k=1)¶
-
set_values
(vals)¶
-
-
class
pyroomacoustics.doa.grid.
GridCircle
(n_points=360, azimuth=None)¶ Bases:
pyroomacoustics.doa.grid.Grid
Creates a grid on the circle.
Parameters: - n_points (int, optional) – The number of uniformly spaced points in the grid.
- azimuth (ndarray, optional) – An array of azimuth (in radians) to use for grid locations. Overrides n_points.
-
apply
(func, spherical=False)¶
-
find_peaks
(k=1)¶
-
plot
(mark_peaks=0)¶
-
class
pyroomacoustics.doa.grid.
GridSphere
(n_points=1000, spherical_points=None)¶ Bases:
pyroomacoustics.doa.grid.Grid
This function computes nearly equidistant points on the sphere using the fibonacci method
Parameters: - n_points (int) – The number of points to sample
- spherical_points (ndarray, optional) – A 2 x n_points array of spherical coordinates with azimuth in the top row and colatitude in the second row. Overrides n_points.
References
http://lgdv.cs.fau.de/uploads/publications/spherical_fibonacci_mapping.pdf http://stackoverflow.com/questions/9600801/evenly-distributing-n-points-on-a-sphere
-
apply
(func, spherical=False)¶ Apply a function to every grid point
-
find_peaks
(k=1)¶ Find the largest peaks on the grid
-
min_max_distance
()¶ Compute some statistics on the distribution of the points
-
plot
(colatitude_ref=None, azimuth_ref=None, colatitude_recon=None, azimuth_recon=None, plotly=True, projection=True, points_only=False)¶
-
plot_old
(plot_points=False, mark_peaks=0)¶ Plot the points on the sphere with their values
-
regrid
()¶ Regrid the non-uniform data on a regular mesh
Plot Helpers¶
A collection of functions to plot maps and points on circles and spheres.
-
pyroomacoustics.doa.plotters.
polar_plt_dirac
(self, azimuth_ref=None, alpha_ref=None, save_fig=False, file_name=None, plt_dirty_img=True)¶ Generate polar plot of DoA results.
Parameters: - azimuth_ref (numpy array) – True direction of sources (in radians).
- alpha_ref (numpy array) – Estimated amplitude of sources.
- save_fig (bool) – Whether or not to save figure as pdf.
- file_name (str) – Name of file (if saved). Default is ‘polar_recon_dirac.pdf’
- plt_dirty_img (bool) – Whether or not to plot spatial spectrum or ‘dirty image’ in the case of FRI.
-
pyroomacoustics.doa.plotters.
sph_plot_diracs
(colatitude_ref=None, azimuth_ref=None, colatitude=None, azimuth=None, dirty_img=None, colatitude_grid=None, azimuth_grid=None, file_name='sph_recon_2d_dirac.pdf', **kwargs)¶ This function plots the dirty image with sources locations on a flat projection of the sphere
Parameters: - colatitude_ref (ndarray, optional) – The colatitudes of a collection of reference points
- azimuths_ref (ndarray, optional) – The azimuths of a collection of reference points for the Diracs
- colatitude (ndarray, optional) – The colatitudes of the collection of points to visualize
- azimuth (ndarray, optional) – The azimuths of the collection of points to visualize
- dirty_img (ndarray) – A 2D map for displaying a pattern on the sphere under the points
- azimuth_grid (ndarray) – The azimuths indexing the dirty_img 2D map
- colatitude_grid (ndarray) – The colatitudes indexing the dirty_img 2D map
-
pyroomacoustics.doa.plotters.
sph_plot_diracs_plotly
(colatitude_ref=None, azimuth_ref=None, colatitude=None, azimuth=None, dirty_img=None, azimuth_grid=None, colatitude_grid=None, surface_base=1, surface_height=0.0)¶ Plots a 2D map on a sphere as well as a collection of diracs using the plotly library
Parameters: - colatitude_ref (ndarray, optional) – The colatitudes of a collection of reference points
- azimuths_ref (ndarray, optional) – The azimuths of a collection of reference points for the Diracs
- colatitude (ndarray, optional) – The colatitudes of the collection of points to visualize
- azimuth (ndarray, optional) – The azimuths of the collection of points to visualize
- dirty_img (ndarray) – A 2D map for displaying a pattern on the sphere under the points
- azimuth_grid (ndarray) – The azimuths indexing the dirty_img 2D map
- colatitude_grid (ndarray) – The colatitudes indexing the dirty_img 2D map
- surface_base – radius corresponding to lowest height on the map
- sufrace_height – radius difference between the lowest and highest point on the map
Peak Detection¶
Detect peaks in data based on their amplitude and other features.
Author: Marcos Duarte, https://github.com/demotu/BMC Version: 1.0.4 License: MIT
-
pyroomacoustics.doa.detect_peaks.
detect_peaks
(x, mph=None, mpd=1, threshold=0, edge='rising', kpsh=False, valley=False, show=False, ax=None)¶ Detect peaks in data based on their amplitude and other features.
Parameters: - x (1D array_like) – data.
- mph ({None, number}, optional (default = None)) – detect peaks that are greater than minimum peak height.
- mpd (positive integer, optional (default = 1)) – detect peaks that are at least separated by minimum peak distance (in number of data).
- threshold (positive number, optional (default = 0)) – detect peaks (valleys) that are greater (smaller) than threshold in relation to their immediate neighbors.
- edge ({None, 'rising', 'falling', 'both'}, optional (default = 'rising')) – for a flat peak, keep only the rising edge (‘rising’), only the falling edge (‘falling’), both edges (‘both’), or don’t detect a flat peak (None).
- kpsh (bool, optional (default = False)) – keep peaks with same height even if they are closer than mpd.
- valley (bool, optional (default = False)) – if True (1), detect valleys (local minima) instead of peaks.
- show (bool, optional (default = False)) – if True (1), plot data in matplotlib figure.
- ax (a matplotlib.axes.Axes instance, optional (default = None)) –
Returns: ind – indeces of the peaks in x.
Return type: 1D array_like
Notes
The detection of valleys instead of peaks is performed internally by simply negating the data: ind_valleys = detect_peaks(-x)
The function can handle NaN’s
See this IPython Notebook [1].
References
[1] http://nbviewer.ipython.org/github/demotu/BMC/blob/master/notebooks/DetectPeaks.ipynb Examples
>>> from detect_peaks import detect_peaks >>> x = np.random.randn(100) >>> x[60:81] = np.nan >>> # detect all peaks and plot data >>> ind = detect_peaks(x, show=True) >>> print(ind)
>>> x = np.sin(2*np.pi*5*np.linspace(0, 1, 200)) + np.random.randn(200)/5 >>> # set minimum peak height = 0 and minimum peak distance = 20 >>> detect_peaks(x, mph=0, mpd=20, show=True)
>>> x = [0, 1, 0, 2, 0, 3, 0, 2, 0, 1, 0] >>> # set minimum peak distance = 2 >>> detect_peaks(x, mpd=2, show=True)
>>> x = np.sin(2*np.pi*5*np.linspace(0, 1, 200)) + np.random.randn(200)/5 >>> # detection of valleys instead of peaks >>> detect_peaks(x, mph=0, mpd=20, valley=True, show=True)
>>> x = [0, 1, 1, 0, 1, 1, 0] >>> # detect both edges >>> detect_peaks(x, edge='both', show=True)
>>> x = [-2, 1, -2, 2, 1, 1, 3, 0] >>> # set threshold = 2 >>> detect_peaks(x, threshold = 2, show=True)
DOA Utilities¶
This module contains useful functions to compute distances and errors on on circles and spheres.
-
pyroomacoustics.doa.utils.
circ_dist
(azimuth1, azimuth2, r=1.0)¶ Returns the shortest distance between two points on a circle
Parameters: - azimuth1 – azimuth of point 1
- azimuth2 – azimuth of point 2
- r (optional) – radius of the circle (Default 1)
-
pyroomacoustics.doa.utils.
great_circ_dist
(r, colatitude1, azimuth1, colatitude2, azimuth2)¶ calculate great circle distance for points located on a sphere
Parameters: - r (radius of the sphere) –
- colatitude1 (colatitude of point 1) –
- azimuth1 (azimuth of point 1) –
- colatitude2 (colatitude of point 2) –
- azimuth2 (azimuth of point 2) –
Returns: great-circle distance
Return type: float or ndarray
-
pyroomacoustics.doa.utils.
polar_distance
(x1, x2)¶ Given two arrays of numbers x1 and x2, pairs the cells that are the closest and provides the pairing matrix index: x1(index(1,:)) should be as close as possible to x2(index(2,:)). The function outputs the average of the absolute value of the differences abs(x1(index(1,:))-x2(index(2,:))).
Parameters: - x1 – vector 1
- x2 – vector 2
Returns: - d – minimum distance between d
- index – the permutation matrix
-
pyroomacoustics.doa.utils.
spher2cart
(r, azimuth, colatitude)¶ Convert a spherical point to cartesian coordinates.
Parameters: - r – radius
- azimuth – azimuth
- colatitude – colatitude
Returns: An ndarray containing the Cartesian coordinates of the points its columns
Return type: ndarray
Single Channel Denoising¶
Module contents¶
Single Channel Noise Reduction¶
Collection of single channel noise reduction (SCNR) algorithms for speech:
At this repository, a deep learning approach in Python can be found.
References
[1] | M. Berouti, R. Schwartz, and J. Makhoul, Enhancement of speech corrupted by acoustic noise, ICASSP ‘79. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1979, pp. 208-211. |
[2] | Y. Ephraim and H. L. Van Trees, A signal subspace approach for speech enhancement, IEEE Transactions on Speech and Audio Processing, vol. 3, no. 4, pp. 251-266, Jul 1995. |
[3] | J. Lim and A. Oppenheim, All-Pole Modeling of Degraded Speech, IEEE Transactions on Acoustics, Speech, and Signal Processing 26.3 (1978): 197-210. |
Algorithms¶
pyroomacoustics.denoise.spectral_subtraction module¶
-
class
pyroomacoustics.denoise.spectral_subtraction.
SpectralSub
(nfft, db_reduc, lookback, beta, alpha=1)¶ Bases:
object
Here we have a class for performing single channel noise reduction via spectral subtraction. The instantaneous signal energy and noise floor is estimated at each time instance (for each frequency bin) and this is used to compute a gain filter with which to perform spectral subtraction.
For a given frame n, the gain for frequency bin k is given by:
\[G[k, n] = \max \left \{ \left ( \dfrac{P[k, n]-\beta P_N[k, n]}{P[k, n]} \right )^\alpha, G_{min} \right \},\]where \(G_{min} = 10^{-(db\_reduc/20)}\) and \(db\_reduc\) is the maximum reduction (in dB) that we are willing to perform for each bin (a high value can actually be detrimental, see below). The instantaneous energy \(P[k,n]\) is computed by simply squaring the frequency amplitude at the bin k. The time-frequency decomposition of the input signal is typically done with the STFT and overlapping frames. The noise estimate \(P_N[k, n]\) for frequency bin k is given by looking back a certain number of frames \(L\) and selecting the bin with the lowest energy:
\[P_N[k, n] = \min_{[n-L, n]} P[k, n]\]This approach works best when the SNR is positive and the noise is rather stationary. An alternative approach for the noise estimate (also in the case of stationary noise) would be to apply a lowpass filter for each frequency bin.
With a large suppression, i.e. large values for \(db\_reduc\), we can observe a typical artefact of such spectral subtraction approaches, namely “musical noise”. Here is nice article about noise reduction and musical noise.
Adjusting the constants \(\beta\) and \(\alpha\) also presents a trade-off between suppression and undesirable artefacts, i.e. more noticeable musical noise.
Below is an example of how to use this class to emulate a streaming/online input. A full example can be found here.
# initialize STFT and SpectralSub objects nfft = 512 stft = pra.transform.STFT(nfft, hop=nfft//2, analysis_window=pra.hann(nfft)) scnr = pra.denoise.SpectralSub(nfft, db_reduc=10, lookback=5, beta=20, alpha=3) # apply block-by-block for n in range(num_blocks): # go to frequency domain for noise reduction stft.analysis(mono_noisy) gain_filt = scnr.compute_gain_filter(stft.X) # estimating input convolved with unknown response mono_denoised = stft.synthesis(gain_filt*stft.X)
There also exists a “one-shot” function.
# import or create `noisy_signal` denoised_signal = apply_spectral_sub(noisy_signal, nfft=512, db_reduc=10, lookback=5, beta=20, alpha=3)
Parameters: - nfft (int) – FFT size. Length of gain filter, i.e. the number of frequency bins, is
given by
nfft//2+1
. - db_reduc (float) – Maximum reduction in dB for each bin.
- lookback (int) – How many frames to look back for the noise estimate.
- beta (float) – Overestimation factor to “push” the gain filter value (at each
frequency) closer to the dB reduction specified by
db_reduc
. - alpha (float, optional) – Exponent factor to modify transition behavior towards the dB reduction
specified by
db_reduc
. Default is 1.
-
compute_gain_filter
(X)¶ Parameters: X (numpy array) – Complex spectrum of length nfft//2+1
.Returns: Gain filter to multiply given spectrum with. Return type: numpy array
- nfft (int) – FFT size. Length of gain filter, i.e. the number of frequency bins, is
given by
-
pyroomacoustics.denoise.spectral_subtraction.
apply_spectral_sub
(noisy_signal, nfft=512, db_reduc=25, lookback=12, beta=30, alpha=1)¶ One-shot function to apply spectral subtraction approach.
Parameters: - noisy_signal (numpy array) – Real signal in time domain.
- nfft (int) – FFT size. Length of gain filter, i.e. the number of frequency bins, is
given by
nfft//2+1
. - db_reduc (float) – Maximum reduction in dB for each bin.
- lookback (int) – How many frames to look back for the noise estimate.
- beta (float) – Overestimation factor to “push” the gain filter value (at each
frequency) closer to the dB reduction specified by
db_reduc
. - alpha (float, optional) – Exponent factor to modify transition behavior towards the dB reduction
specified by
db_reduc
. Default is 1.
Returns: Enhanced/denoised signal.
Return type: numpy array
pyroomacoustics.denoise.subspace module¶
-
class
pyroomacoustics.denoise.subspace.
Subspace
(frame_len=256, mu=10, lookback=10, skip=2, thresh=0.01, data_type='float32')¶ Bases:
object
A class for performing single channel noise reduction in the time domain via the subspace approach. This implementation is based off of the approach presented in:
Y. Ephraim and H. L. Van Trees, A signal subspace approach for speech enhancement, IEEE Transactions on Speech and Audio Processing, vol. 3, no. 4, pp. 251-266, Jul 1995.Moreover, an adaptation of the subspace approach is implemented here, as presented in:
Y. Hu and P. C. Loizou, A subspace approach for enhancing speech corrupted by colored noise, IEEE Signal Processing Letters, vol. 9, no. 7, pp. 204-206, Jul 2002.Namely, an eigendecomposition is performed on the matrix:
\[\Sigma = R_n^{-1} R_y - I,\]where \(R_n\) is the noise covariance matrix, \(R_y\) is the covariance matrix of the input noisy signal, and \(I\) is the identity matrix. The covariance matrices are estimated from past samples; the number of past samples/frames used for estimation can be set with the parameters lookback and skip. A simple energy threshold (thresh parameter) is used to identify noisy frames.
The eigenvectors corresponding to the positive eigenvalues of \(\Sigma\) are used to create a linear operator \(H_{opt}\) in order to enhance the noisy input signal \(\mathbf{y}\):
\[\mathbf{\hat{x}} = H_{opt} \cdot \mathbf{y}.\]The length of \(\mathbf{y}\) is specified by the parameter frame_len; \(50\%\) overlap and add with a Hanning window is used to reconstruct the output. A great summary of the approach can be found in the paper by Y. Hu and P. C. Loizou under Section III, B.
Adjusting the factor \(\mu\) presents a trade-off between noise suppression and distortion.
Below is an example of how to use this class to emulate a streaming/online input. A full example can be found here. Depending on your choice for frame_len, lookback, and skip, the approach may not be suitable for real-time processing.
# initialize Subspace object scnr = Subspace(frame_len=256, mu=10, lookback=10, skip=2, thresh=0.01) # apply block-by-block for n in range(num_blocks): denoised_signal = scnr.apply(noisy_input)
There also exists a “one-shot” function.
# import or create `noisy_signal` denoised_signal = apply_subspace(noisy_signal, frame_len=256, mu=10, lookback=10, skip=2, thresh=0.01)
Parameters: - frame_len (int) – Frame length in samples. Note that large values (above 256) will make the eigendecompositions very expensive.
- mu (float) – Enhancement factor, larger values suppress more noise but could lead to more distortion.
- lookback (int) – How many frames to look back for covariance matrix estimation.
- skip (int) – How many samples to skip when estimating the covariance matrices with past samples. skip=1 will use all possible frames in the estimation.
- thresh (float) – Threshold to distinguish between (signal+noise) and (noise) frames. A high value will classify more frames as noise but might remove desired signal!
- data_type ('float32' or 'float64') – Data type to use in the enhancement procedure. Default is ‘float32’.
-
apply
(new_samples)¶ Parameters: new_samples (numpy array) – New array of samples of length self.hop in the time domain. Returns: Denoised samples. Return type: numpy array
-
compute_signal_projection
()¶
-
update_cov_matrices
(new_samples)¶
-
pyroomacoustics.denoise.subspace.
apply_subspace
(noisy_signal, frame_len=256, mu=10, lookback=10, skip=2, thresh=0.01, data_type=<MagicMock id='140703898059552'>)¶ One-shot function to apply subspace denoising approach.
Parameters: - noisy_signal (numpy array) – Real signal in time domain.
- frame_len (int) – Frame length in samples. Note that large values (above 256) will make the eigendecompositions very expensive. 50% overlap is used with hanning window.
- mu (float) – Enhancement factor, larger values suppress more noise but could lead to more distortion.
- lookback (int) – How many frames to look back for covariance matrix estimation.
- skip (int) – How many samples to skip when estimating the covariance matrices with past samples. skip=1 will use all possible frames in the estimation.
- thresh (float) – Threshold to distinguish between (signal+noise) and (noise) frames. A high value will classify more frames as noise and might even remove desired signal!
- data_type ('float32' or 'float64') – Data type to use in the enhancement procedure. Default is ‘float32’.
Returns: Enhanced/denoised signal.
Return type: numpy array
pyroomacoustics.denoise.iterative_wiener module¶
-
class
pyroomacoustics.denoise.iterative_wiener.
IterativeWiener
(frame_len, lpc_order, iterations, alpha=0.8, thresh=0.01)¶ Bases:
object
A class for performing single channel noise reduction in the frequency domain with a Wiener filter that is iteratively computed. This implementation is based off of the approach presented in:
J. Lim and A. Oppenheim, All-Pole Modeling of Degraded Speech, IEEE Transactions on Acoustics, Speech, and Signal Processing 26.3 (1978): 197-210.For each frame, a Wiener filter of the following form is computed and applied to the noisy samples in the frequency domain:
\[H(\omega) = \dfrac{P_S(\omega)}{P_S(\omega) + \sigma_d^2},\]where \(P_S(\omega)\) is the speech power spectral density and \(\sigma_d^2\) is the noise variance.
The following assumptions are made in order to arrive at the above filter as the optimal solution:
The noisy samples \(y[n]\) can be written as:
\[y[n] = s[n] + d[n],\]where \(s[n]\) is the desired signal and \(d[n]\) is the background noise.
The signal and noise are uncorrelated.
The noise is white Gaussian, i.e. it has a flat power spectrum with amplitude \(\sigma_d^2\).
Under these assumptions, the above Wiener filter minimizes the mean-square error between the true samples \(s[0:N-1]\) and the estimated one \(\hat{s[0:N-1]}\) by filtering \(y[0:N-1]\) with the above filter (with \(N\) being the frame length).
The fundamental part of this approach is correctly (or as well as possible) estimating the speech power spectral density \(P_S(\omega)\) and the noise variance \(\sigma_d^2\). For this, we need a voice activity detector in order to determine when we have incoming speech. In this implementation, we use a simple energy threshold on the input frame, which is set with the thresh input parameter.
When no speech is identified, the input frame is used to update the noise variance \(\sigma_d^2\). We could simply set \(\sigma_d^2\) to the energy of the input frame. However, we employ a simple IIR filter in order to avoid abrupt changes in the noise level (thus adding an assumption of stationary):
\[\sigma_d^2[k] = \alpha \cdot \sigma_d^2[k-1] + (1-\alpha) \cdot \sigma_y^2,\]where \(\alpha\) is the smoothing parameter and \(\sigma_y^2\) is the energy of the input frame. A high value of \(\alpha\) will update the noise level very slowly, while a low value will make it very sensitive to changes at the input. The value for \(\alpha\) can be set with the alpha parameter.
When speech is identified in the input frame, an iterative procedure is employed in order to estimate \(P_S(\omega)\) (and therefore the Wiener filter \(H\) as well). This procedure consists of computing \(p\) linear predictive coding (LPC) coefficients of the input frame. The number of LPC coefficients is set with the parameter lpc_order. These LPC coefficients form an all-pole filter that models the vocal tract as described in the above paper (Eq. 1). With these coefficients, we can then obtain an estimate of the speech power spectral density (Eq. 41b) and thus the corresponding Wiener filter (Eq. 41a). This Wiener filter is used to denoise the input frame. Moreover, with this denoised frame, we can compute new LPC coefficients and therefore a new Wiener filter. The idea behind this approach is that by iteratively computing the LPC coefficients as such, we can obtain a better estimate of the speech power spectral density. The number of iterations can be set with the iterations parameter.
Below is an example of how to use this class to emulate a streaming/online input. A full example can be found here.
# initialize STFT and IterativeWiener objects nfft = 512 stft = pra.transform.STFT(nfft, hop=nfft//2, analysis_window=pra.hann(nfft)) scnr = IterativeWiener(frame_len=nfft, lpc_order=20, iterations=2, alpha=0.8, thresh=0.01) # apply block-by-block for n in range(num_blocks): # go to frequency domain, 50% overlap stft.analysis(mono_noisy) # compute wiener output X = scnr.compute_filtered_output( current_frame=stft.fft_in_buffer, frame_dft=stft.X) # back to time domain mono_denoised = stft.synthesis(X)
There also exists a “one-shot” function.
# import or create `noisy_signal` denoised_signal = apply_iterative_wiener(noisy_signal, frame_len=512, lpc_order=20, iterations=2, alpha=0.8, thresh=0.01)
Parameters: - frame_len (int) – Frame length in samples.
- lpc_order (int) – Number of LPC coefficients to compute
- iterations (int) – How many iterations to perform in updating the Wiener filter for each signal frame.
- alpha (int) – Smoothing factor within [0,1] for updating noise level. Closer to 1 gives more weight to the previous noise level, while closer to 0 gives more weight to the current frame’s level. Closer to 0 can track more rapid changes in the noise level. However, if a speech frame is incorrectly identified as noise, you can end up removing desired speech.
- thresh (float) – Threshold to distinguish between (signal+noise) and (noise) frames. A high value will classify more frames as noise but might remove desired signal!
-
compute_filtered_output
(current_frame, frame_dft=None)¶ Compute Wiener filter in the frequency domain.
Parameters: - current_frame (numpy array) – Noisy samples.
- frame_dft (numpy array) – DFT of input samples. If not provided, it will be computed.
Returns: Output of denoising in the frequency domain.
Return type: numpy array
-
pyroomacoustics.denoise.iterative_wiener.
apply_iterative_wiener
(noisy_signal, frame_len=512, lpc_order=20, iterations=2, alpha=0.8, thresh=0.01)¶ One-shot function to apply iterative Wiener filtering for denoising.
Parameters: - noisy_signal (numpy array) – Real signal in time domain.
- frame_len (int) – Frame length in samples. 50% overlap is used with hanning window.
- lpc_order (int) – Number of LPC coefficients to compute
- iterations (int) – How many iterations to perform in updating the Wiener filter for each signal frame.
- alpha (int) – Smoothing factor within [0,1] for updating noise level. Closer to 1 gives more weight to the previous noise level, while closer to 0 gives more weight to the current frame’s level. Closer to 0 can track more rapid changes in the noise level. However, if a speech frame is incorrectly identified as noise, you can end up removing desired speech.
- thresh (float) – Threshold to distinguish between (signal+noise) and (noise) frames. A high value will classify more frames as noise but might remove desired signal!
Returns: Enhanced/denoised signal.
Return type: numpy array
-
pyroomacoustics.denoise.iterative_wiener.
compute_speech_psd
(a, g2, nfft)¶ Compute power spectral density of speech as specified in equation (41b) of
J. Lim and A. Oppenheim, All-Pole Modeling of Degraded Speech, IEEE Transactions on Acoustics, Speech, and Signal Processing 26.3 (1978): 197-210.Namely:
\[P_S(\omega) = \dfrac{g^2}{\left \| 1 - \sum_{k=1}^p a_k \cdot e^{-jk\omega} \right \|^2},\]where \(p\) is the LPC order, \(a_k\) are the LPC coefficients, and \(g\) is an estimated gain factor.
The power spectral density is computed at the frequencies corresponding to a DFT of length nfft.
Parameters: - a (numpy array) – LPC coefficients.
- g2 (float) – Squared gain.
- nfft (int) – FFT length.
Returns: Power spectral density from LPC coefficients.
Return type: numpy array
-
pyroomacoustics.denoise.iterative_wiener.
compute_squared_gain
(a, noise_psd, y)¶ Estimate the squared gain of the speech power spectral density as done on p. 204 of
J. Lim and A. Oppenheim, All-Pole Modeling of Degraded Speech, IEEE Transactions on Acoustics, Speech, and Signal Processing 26.3 (1978): 197-210.Namely solving for \(g^2\) such that the following expression is satisfied:
\[\dfrac{N}{2\pi} \int_{-\pi}^{\pi} \dfrac{g^2}{\left \| 1 - \sum_{k=1}^p a_k \cdot e^{-jk\omega} \right \|^2} d\omega = \sum_{n=0}^{N-1} y^2(n) - N\cdot\sigma_d^2,\]where \(N\) is the number of noisy samples \(y\), \(a_k\) are the \(p\) LPC coefficients, and \(\sigma_d^2\) is the noise variance.
Parameters: - a (numpy array) – LPC coefficients.
- noise_psd (float or numpy array) – Noise variance if white noise, numpy array otherwise.
- y (numpy array) – Noisy time domain samples.
Returns: Squared gain.
Return type: float
-
pyroomacoustics.denoise.iterative_wiener.
compute_wiener_filter
(speech_psd, noise_psd)¶ Compute Wiener filter in the frequency domain.
Parameters: - speech_psd (numpy array) – Speech power spectral density.
- noise_psd (float or numpy array) – Noise variance if white noise, numpy array otherwise.
Returns: Frequency domain filter, computed at the same frequency values as speech_psd.
Return type: numpy array
Phase Processing¶
Submodules¶
Griffin-Lim Phase Reconstruction¶
Implementation of the classic phase reconstruction from Griffin and Lim [1]. The input to the algorithm is the magnitude from STFT measurements.
The algorithm works by starting by assigning a (possibly random) initial phase to the measurements, and then iteratively
- Reconstruct the time-domain signal
- Re-apply STFT
- Enforce the known magnitude of the measurements
The implementation supports different types of initialization via the keyword argument ini
.
- If omitted, the initial phase is uniformly zero
- If
ini="random"
, a random phase is used - If
ini=A
for anumpy.ndarray
of the same shape as the input magnitude,A / numpy.abs(A)
is used for initialization
Example
import numpy as np
from scipy.io import wavfile
import pyroomacoustics as pra
# We open a speech sample
filename = "examples/input_samples/cmu_arctic_us_axb_a0004.wav"
fs, audio = wavfile.read(filename)
# These are the parameters of the STFT
fft_size = 512
hop = fft_size // 4
win_a = np.hamming(fft_size)
win_s = pra.transform.compute_synthesis_window(win_a, hop)
n_iter = 200
engine = pra.transform.STFT(
fft_size, hop=hop, analysis_window=win_a, synthesis_window=win_s
)
X = engine.analysis(audio)
X_mag = np.abs(X)
X_mag_norm = np.linalg.norm(X_mag) ** 2
# monitor convergence
errors = []
# the callback to track the spectral distance convergence
def cb(epoch, Y, y):
# we measure convergence via spectral distance
Y_2 = engine.analysis(y)
sd = np.linalg.norm(X_mag - np.abs(Y_2)) ** 2 / X_mag_norm
# save in the list every 10 iterations
if epoch % 10 == 0:
errors.append(sd)
pra.phase.griffin_lim(X_mag, hop, win_a, n_iter=n_iter, callback=cb)
plt.semilogy(np.arange(len(errors)) * 10, errors)
plt.show()
References
[1] | D. Griffin and J. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, no. 2, pp. 236–243, 1984. |
-
pyroomacoustics.phase.gl.
griffin_lim
(X, hop, analysis_window, fft_size=None, stft_kwargs={}, n_iter=100, ini=None, callback=None)¶ Implementation of the Griffin-Lim phase reconstruction algorithm from STFT magnitude measurements.
Parameters: - X (array_like, shape (n_frames, n_freq)) – The STFT magnitude measurements
- hop (int) – The frame shift of the STFT
- analysis_window (array_like, shape (fft_size,)) – The window used for the STFT analysis
- fft_size (int, optional) – The FFT size for the STFT, if omitted it is computed from the dimension of
X
- stft_kwargs (dict, optional) – Dictionary of extra parameters for the STFT
- n_iter (int, optional) – The number of iteration
- ini (str or array_like, np.complex, shape (n_frames, n_freq), optional) – The initial value of the phase estimate. If “random”, uses a random guess. If
None
, uses0
phase. - callback (func, optional) – A callable taking as argument an int and the reconstructed STFT and time-domain signals
Module contents¶
Phase Processing¶
This sub-package contains algorithms related to phase specific process, such as phase reconstruction.
- Griffin-Lim
- Phase reconstruction by fixed-point iterations [1]
References
[1] | D. Griffin and J. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, no. 2, pp. 236–243, 1984. |
Pyroomacoustics API¶
Subpackages¶
pyroomacoustics.experimental package¶
Submodules¶
pyroomacoustics.experimental.deconvolution module¶
-
pyroomacoustics.experimental.deconvolution.
deconvolve
(y, s, length=None, thresh=0.0)¶ Deconvolve an excitation signal from an impulse response
Parameters: - y (ndarray) – The recording
- s (ndarray) – The excitation signal
- length (int, optional) – the length of the impulse response to deconvolve
- thresh (float, optional) – ignore frequency bins with power lower than this
-
pyroomacoustics.experimental.deconvolution.
wiener_deconvolve
(y, x, length=None, noise_variance=1.0, let_n_points=15, let_div_base=2)¶ Deconvolve an excitation signal from an impulse response
We use Wiener filter
Parameters: - y (ndarray) – The recording
- x (ndarray) – The excitation signal
- length (int, optional) – the length of the impulse response to deconvolve
- noise_variance (float, optional) – estimate of the noise variance
- let_n_points (int) – number of points to use in the LET approximation
- let_div_base (float) – the divider used for the LET grid
pyroomacoustics.experimental.delay_calibration module¶
-
class
pyroomacoustics.experimental.delay_calibration.
DelayCalibration
(fs, pad_time=0.0, mls_bits=16, repeat=1, temperature=25.0, humidity=50.0, pressure=1000.0)¶ Bases:
object
-
run
(distance=0.0, ch_in=None, ch_out=None, oversampling=1)¶ Run the calibration. Plays a maximum length sequence and cross correlate the signals to find the time delay.
Parameters: - distance (float, optional) – Distance between the speaker and microphone
- ch_in (int, optional) – The input channel to use. If not specified, all channels are calibrated
- ch_out (int, optional) – The output channel to use. If not specified, all channels are calibrated
-
pyroomacoustics.experimental.localization module¶
-
pyroomacoustics.experimental.localization.
edm_line_search
(R, tdoa, bounds, steps)¶ We have a number of points of know locations and have the TDOA measurements from an unknown location to the known point. We perform an EDM line search to find the unknown offset to turn TDOA to TOA.
Parameters: - R (ndarray) – An ndarray of 3xN where each column is the location of a point
- tdoa (ndarray) – A length N vector containing the tdoa measurements from uknown location to known ones
- bounds (ndarray) – Bounds for the line search
- step (float) – Step size for the line search
-
pyroomacoustics.experimental.localization.
tdoa
(x1, x2, interp=1, fs=1, phat=True)¶ This function computes the time difference of arrival (TDOA) of the signal at the two microphones. This in turns is used to infer the direction of arrival (DOA) of the signal.
Specifically if s(k) is the signal at the reference microphone and s_2(k) at the second microphone, then for signal arriving with DOA theta we have
s_2(k) = s(k - tau)
with
tau = fs*d*sin(theta)/c
where d is the distance between the two microphones and c the speed of sound.
We recover tau using the Generalized Cross Correlation - Phase Transform (GCC-PHAT) method. The reference is
Knapp, C., & Carter, G. C. (1976). The generalized correlation method for estimation of time delay.
Parameters: - x1 (nd-array) – The signal of the reference microphone
- x2 (nd-array) – The signal of the second microphone
- interp (int, optional (default 1)) – The interpolation value for the cross-correlation, it can improve the time resolution (and hence DOA resolution)
- fs (int, optional (default 44100 Hz)) – The sampling frequency of the input signal
Returns: - theta (float) – the angle of arrival (in radian (I think))
- pwr (float) – the magnitude of the maximum cross correlation coefficient
- delay (float) – the delay between the two microphones (in seconds)
-
pyroomacoustics.experimental.localization.
tdoa_loc
(R, tdoa, c, x0=None)¶ TDOA based localization
Parameters: - R (ndarray) – A 3xN array of 3D points
- tdoa (ndarray) – A length N array of tdoa
- c (float) – The speed of sound
- Reference –
- --------- –
- Li, TDOA localization (Steven) –
pyroomacoustics.experimental.measure_ir module¶
-
pyroomacoustics.experimental.measure_ir.
measure_ir
(sweep_length=1.0, sweep_type='exponential', fs=48000, f_lo=0.0, f_hi=None, volume=0.9, pre_delay=0.0, post_delay=0.1, fade_in_out=0.0, dev_in=None, dev_out=None, channels_input_mapping=None, channels_output_mapping=None, ascending=False, deconvolution=True, plot=True) Measures an impulse response by playing a sweep and recording it using the sounddevice package.
Parameters: - sweep_length (float, optional) – length of the sweep in seconds
- sweep_type (SweepType, optional) – type of sweep to use linear or exponential (default)
- fs (int, optional) – sampling frequency (default 48 kHz)
- f_lo (float, optional) – lowest frequency in the sweep
- f_hi (float, optional) – highest frequency in the sweep, can be a negative offset from fs/2
- volume (float, optional) – multiply the sweep by this number before playing (default 0.9)
- pre_delay (float, optional) – delay in second before playing sweep
- post_delay (float, optional) – delay in second before stopping recording after playing the sweep
- fade_in_out (float, optional) – length in seconds of the fade in and out of the sweep (default 0.)
- dev_in (int, optional) – input device number
- dev_out (int, optional) – output device number
- channels_input_mapping (array_like, optional) – List of channel numbers (starting with 1) to record. If mapping is given, channels is silently ignored.
- channels_output_mapping (array_like, optional) – List of channel numbers (starting with 1) where the columns of data shall be played back on. Must have the same length as number of channels in data (except if data is mono, in which case the signal is played back on all given output channels). Each channel number may only appear once in mapping.
- ascending (bool, optional) – wether the sweep is from high to low (default) or low to high frequencies
- deconvolution (bool, optional) – if True, apply deconvolution to the recorded signal to remove the sweep (default 0.)
- plot (bool, optional) – plot the resulting signal
Returns: Return type: Returns the impulse response if deconvolution == True and the recorded signal if not
pyroomacoustics.experimental.physics module¶
-
pyroomacoustics.experimental.physics.
calculate_speed_of_sound
(t, h, p)¶ Compute the speed of sound as a function of temperature, humidity and pressure
Parameters: - t (temperature [Celsius]) –
- h (relative humidity [%]) –
- p (atmospheric pressure [kpa]) –
Returns: Return type: Speed of sound in [m/s]
pyroomacoustics.experimental.point_cloud module¶
Contains PointCloud class.
Given a number of points and their relative distances, this class aims at reconstructing their relative coordinates.
-
class
pyroomacoustics.experimental.point_cloud.
PointCloud
(m=1, dim=3, diameter=0.0, X=None, labels=None, EDM=None)¶ Bases:
object
-
EDM
()¶ Computes the EDM corresponding to the marker set
-
align
(marker, axis)¶ Rotate the marker set around the given axis until it is aligned onto the given marker
Parameters: - marker (int or str) – the index or label of the marker onto which to align the set
- axis (int) – the axis around which the rotation happens
-
center
(marker)¶ Translate the marker set so that the argument is the origin.
-
classical_mds
(D)¶ Classical multidimensional scaling
Parameters: D (square 2D ndarray) – Euclidean Distance Matrix (matrix containing squared distances between points
-
copy
()¶ Return a deep copy of this marker set object
-
correct
(corr_dic)¶ correct a marker location by a given vector
-
doa
(receiver, source)¶ Computes the direction of arrival wrt a source and receiver
-
flatten
(ind)¶ Transform the set of points so that the subset of markers given as argument is as close as flat (wrt z-axis) as possible.
Parameters: ind (list of bools) – Lists of marker indices that should be all in the same subspace
-
fromEDM
(D, labels=None, method='mds')¶ Compute the position of markers from their Euclidean Distance Matrix
Parameters: - D (square 2D ndarray) – Euclidean Distance Matrix (matrix containing squared distances between points
- labels (list, optional) – A list of human friendly labels for the markers (e.g. ‘east’, ‘west’, etc)
- method (str, optional) – The method to use * ‘mds’ for multidimensional scaling (default) * ‘tri’ for trilateration
-
key2ind
(ref)¶ Get the index location from a label
-
normalize
(refs=None)¶ Reposition points such that x0 is at origin, x1 lies on c-axis and x2 lies above x-axis, keeping the relative position to each other. The z-axis is defined according to right hand rule by default.
Parameters: - refs (list of 3 ints or str) – The index or label of three markers used to define (origin, x-axis, y-axis)
- left_hand (bool, optional (default False)) – Normally the z-axis is defined using right-hand rule, this flag allows to override this behavior
-
plot
(axes=None, show_labels=True, **kwargs)¶
-
trilateration
(D)¶ Find the location of points based on their distance matrix using trilateration
Parameters: D (square 2D ndarray) – Euclidean Distance Matrix (matrix containing squared distances between points
-
trilateration_single_point
(c, Dx, Dy)¶ Given x at origin (0,0) and y at (0,c) the distances from a point at unknown location Dx, Dy to x, y, respectively, finds the position of the point.
-
RT60 Measurement Routine¶
Automatically determines the reverberation time of an impulse response using the Schroeder method [1].
References
[1] | M. R. Schroeder, “New Method of Measuring Reverberation Time,” J. Acoust. Soc. Am., vol. 37, no. 3, pp. 409-412, Mar. 1968. |
-
pyroomacoustics.experimental.rt60.
measure_rt60
(h, fs=1, decay_db=60, plot=False, rt60_tgt=None)¶ Analyze the RT60 of an impulse response. Optionaly plots some useful information.
Parameters: - h (array_like) – The impulse response.
- fs (float or int, optional) – The sampling frequency of h (default to 1, i.e., samples).
- decay_db (float or int, optional) – The decay in decibels for which we actually estimate the time. Although we would like to estimate the RT60, it might not be practical. Instead, we measure the RT20 or RT30 and extrapolate to RT60.
- plot (bool, optional) – If set to
True
, the power decay and different estimated values will be plotted (default False). - rt60_tgt (float) – This parameter can be used to indicate a target RT60 to which we want to compare the estimated value.
pyroomacoustics.experimental.signals module¶
A few test signals like sweeps and stuff.
-
pyroomacoustics.experimental.signals.
exponential_sweep
(T, fs, f_lo=0.0, f_hi=None, fade=None, ascending=False)¶ Exponential sine sweep
Parameters: - T (float) – length in seconds
- fs – sampling frequency
- f_lo (float) – lowest frequency in fraction of fs (default 0)
- f_hi (float) – lowest frequency in fraction of fs (default 1)
- fade (float, optional) – length of fade in and out in seconds (default 0)
- ascending (bool, optional) –
-
pyroomacoustics.experimental.signals.
linear_sweep
(T, fs, f_lo=0.0, f_hi=None, fade=None, ascending=False)¶ Linear sine sweep
Parameters: - T (float) – length in seconds
- fs – sampling frequency
- f_lo (float) – lowest frequency in fraction of fs (default 0)
- f_hi (float) – lowest frequency in fraction of fs (default 1)
- fade (float, optional) – length of fade in and out in seconds (default 0)
- ascending (bool, optional) –
-
pyroomacoustics.experimental.signals.
window
(signal, n_win)¶ window the signal at beginning and end with window of size n_win/2
Module contents¶
Experimental¶
A bunch of routines useful when doing measurements and experiments.
-
pyroomacoustics.experimental.
measure_ir
(sweep_length=1.0, sweep_type='exponential', fs=48000, f_lo=0.0, f_hi=None, volume=0.9, pre_delay=0.0, post_delay=0.1, fade_in_out=0.0, dev_in=None, dev_out=None, channels_input_mapping=None, channels_output_mapping=None, ascending=False, deconvolution=True, plot=True)¶ Measures an impulse response by playing a sweep and recording it using the sounddevice package.
Parameters: - sweep_length (float, optional) – length of the sweep in seconds
- sweep_type (SweepType, optional) – type of sweep to use linear or exponential (default)
- fs (int, optional) – sampling frequency (default 48 kHz)
- f_lo (float, optional) – lowest frequency in the sweep
- f_hi (float, optional) – highest frequency in the sweep, can be a negative offset from fs/2
- volume (float, optional) – multiply the sweep by this number before playing (default 0.9)
- pre_delay (float, optional) – delay in second before playing sweep
- post_delay (float, optional) – delay in second before stopping recording after playing the sweep
- fade_in_out (float, optional) – length in seconds of the fade in and out of the sweep (default 0.)
- dev_in (int, optional) – input device number
- dev_out (int, optional) – output device number
- channels_input_mapping (array_like, optional) – List of channel numbers (starting with 1) to record. If mapping is given, channels is silently ignored.
- channels_output_mapping (array_like, optional) – List of channel numbers (starting with 1) where the columns of data shall be played back on. Must have the same length as number of channels in data (except if data is mono, in which case the signal is played back on all given output channels). Each channel number may only appear once in mapping.
- ascending (bool, optional) – wether the sweep is from high to low (default) or low to high frequencies
- deconvolution (bool, optional) – if True, apply deconvolution to the recorded signal to remove the sweep (default 0.)
- plot (bool, optional) – plot the resulting signal
Returns: Return type: Returns the impulse response if deconvolution == True and the recorded signal if not
Submodules¶
pyroomacoustics.acoustics module¶
-
class
pyroomacoustics.acoustics.
OctaveBandsFactory
(base_frequency=125.0, fs=16000, n_fft=512)¶ Bases:
object
A class to process uniformly all properties that are defined on octave bands.
Each property is stored for an octave band.
-
base_freq
¶ The center frequency of the first octave band
Type: float
-
fs
¶ The target sampling frequency
Type: float
-
n_bands
¶ The number of octave bands needed to cover from base_freq to fs / 2 (i.e. floor(log2(fs / base_freq)))
Type: int
-
bands
¶ The list of bin boundaries for the octave bands
Type: list of tuple
-
centers
¶ The list of band centers
-
all_materials
¶ The list of all Material objects created by the factory
Type: list of Material
Parameters: - base_frequency (float, optional) – The center frequency of the first octave band (default: 125 Hz)
- fs (float, optional) – The sampling frequency used (default: 16000 Hz)
- third_octave (bool, optional) – Use third octave bands if True (default: False)
-
analysis
(x, band=None)¶ Process a signal x through the filter bank
Parameters: x (ndarray (n_samples)) – The input signal Returns: The input signal filters through all the bands Return type: ndarray (n_samples, n_bands)
-
get_bw
()¶ Returns the bandwidth of the bands
-
-
pyroomacoustics.acoustics.
bandpass_filterbank
(bands, fs=1.0, order=8, output='sos')¶ Create a bank of Butterworth bandpass filters
Parameters: - bands (array_like, shape == (n, 2)) – The list of bands
[[flo1, fup1], [flo2, fup2], ...]
- fs (float, optional) – Sampling frequency (default 1.)
- order (int, optional) – The order of the IIR filters (default: 8)
- output ({'ba', 'zpk', 'sos'}) – Type of output: numerator/denominator (‘ba’), pole-zero (‘zpk’), or second-order sections (‘sos’). Default is ‘ba’.
Returns: - b, a (ndarray, ndarray) – Numerator (b) and denominator (a) polynomials of the IIR filter. Only returned if output=’ba’.
- z, p, k (ndarray, ndarray, float) – Zeros, poles, and system gain of the IIR filter transfer function. Only returned if output=’zpk’.
- sos (ndarray) – Second-order sections representation of the IIR filter. Only returned if output==’sos’.
- bands (array_like, shape == (n, 2)) – The list of bands
-
pyroomacoustics.acoustics.
bands_hz2s
(bands_hz, Fs, N, transform='dft')¶ Converts bands given in Hertz to samples with respect to a given sampling frequency Fs and a transform size N an optional transform type is used to handle DCT case.
-
pyroomacoustics.acoustics.
binning
(S, bands)¶ This function computes the sum of all columns of S in the subbands enumerated in bands
-
pyroomacoustics.acoustics.
critical_bands
()¶ Compute the Critical bands as defined in the book: Psychoacoustics by Zwicker and Fastl. Table 6.1 p. 159
-
pyroomacoustics.acoustics.
inverse_sabine
(rt60, room_dim, c=None)¶ Given the desired reverberation time (RT60, i.e. the time for the energy to drop by 60 dB), the dimensions of a rectangular room (shoebox), and sound speed, computes the energy absorption coefficient and maximum image source order needed. The speed of sound used is the package wide default (in
constants
).Parameters: - rt60 (float) – desired RT60 (time it takes to go from full amplitude to 60 db decay) in seconds
- room_dim (list of floats) – list of length 2 or 3 of the room side lengths
- c (float) – speed of sound
Returns: - absorption (float) – the energy absorption coefficient to be passed to room constructor
- max_order (int) – the maximum image source order necessary to achieve the desired RT60
-
pyroomacoustics.acoustics.
invmelscale
(b)¶ Converts from melscale to frequency in Hertz according to Huang-Acero-Hon (6.143)
-
pyroomacoustics.acoustics.
melfilterbank
(M, N, fs=1, fl=0.0, fh=0.5)¶ Returns a filter bank of triangular filters spaced according to mel scale
We follow Huang-Acera-Hon 6.5.2
Parameters: - M ((int)) – The number of filters in the bank
- N ((int)) – The length of the DFT
- fs ((float) optional) – The sampling frequency (default 8000)
- fl ((float)) – Lowest frequency in filter bank as a fraction of fs (default 0.)
- fh ((float)) – Highest frequency in filter bank as a fraction of fs (default 0.5)
Returns: Return type: An M times int(N/2)+1 ndarray that contains one filter per row
-
pyroomacoustics.acoustics.
melscale
(f)¶ Converts f (in Hertz) to the melscale defined according to Huang-Acero-Hon (2.6)
-
pyroomacoustics.acoustics.
mfcc
(x, L=128, hop=64, M=14, fs=8000, fl=0.0, fh=0.5)¶ Computes the Mel-Frequency Cepstrum Coefficients (MFCC) according to the description by Huang-Acera-Hon 6.5.2 (2001) The MFCC are features mimicing the human perception usually used for some learning task.
This function will first split the signal into frames, overlapping or not, and then compute the MFCC for each frame.
Parameters: - x ((nd-array)) – Input signal
- L ((int)) – Frame size (default 128)
- hop ((int)) – Number of samples to skip between two frames (default 64)
- M ((int)) – Number of mel-frequency filters (default 14)
- fs ((int)) – Sampling frequency (default 8000)
- fl ((float)) – Lowest frequency in filter bank as a fraction of fs (default 0.)
- fh ((float)) – Highest frequency in filter bank as a fraction of fs (default 0.5)
Returns: Return type: The MFCC of the input signal
-
pyroomacoustics.acoustics.
octave_bands
(fc=1000, third=False, start=0.0, n=8)¶ Create a bank of octave bands
Parameters: - fc (float, optional) – The center frequency
- third (bool, optional) – Use third octave bands (default False)
- start (float, optional) – Starting frequency for octave bands in Hz (default 0.)
- n (int, optional) – Number of frequency bands (default 8)
-
pyroomacoustics.acoustics.
rt60_eyring
(S, V, a, m, c)¶ This is the Eyring formula for estimation of the reverberation time.
Parameters: - S – the total surface of the room walls in m^2
- V – the volume of the room in m^3
- a (float) – the equivalent absorption coefficient
sum(a_w * S_w) / S
wherea_w
andS_w
are the absorption and surface of wallw
, respectively. - m (float) – attenuation constant of air
- c (float) – speed of sound in m/s
Returns: The estimated reverberation time (RT60)
Return type: float
-
pyroomacoustics.acoustics.
rt60_sabine
(S, V, a, m, c)¶ This is the Eyring formula for estimation of the reverberation time.
Parameters: - S – the total surface of the room walls in m^2
- V – the volume of the room in m^3
- a (float) – the equivalent absorption coefficient
sum(a_w * S_w) / S
wherea_w
andS_w
are the absorption and surface of wallw
, respectively. - m (float) – attenuation constant of air
- c (float) – speed of sound in m/s
Returns: The estimated reverberation time (RT60)
Return type: float
pyroomacoustics.beamforming module¶
-
class
pyroomacoustics.beamforming.
Beamformer
(R, fs, N=1024, Lg=None, hop=None, zpf=0, zpb=0)¶ Bases:
pyroomacoustics.beamforming.MicrophoneArray
At some point, in some nice way, the design methods should also go here. Probably with generic arguments.
Parameters: - R (numpy.ndarray) – Mics positions
- fs (int) – Sampling frequency
- N (int, optional) – Length of FFT, i.e. number of FD beamforming weights, equally spaced. Defaults to 1024.
- Lg (int, optional) – Length of time-domain filters. Default to N.
- hop (int, optional) – Hop length for frequency domain processing. Default to N/2.
- zpf (int, optional) – Front zero padding length for frequency domain processing. Default is 0.
- zpb (int, optional) – Zero padding length for frequency domain processing. Default is 0.
-
far_field_weights
(phi)¶ This method computes weight for a far field at infinity
phi: direction of beam
-
filters_from_weights
(non_causal=0.0)¶ Compute time-domain filters from frequency domain weights.
Parameters: non_causal (float, optional) – ratio of filter coefficients used for non-causal part
-
plot
(sum_ir=False, FD=True)¶
-
plot_beam_response
()¶
-
plot_response_from_point
(x, legend=None)¶
-
process
(FD=False)¶
-
rake_delay_and_sum_weights
(source, interferer=None, R_n=None, attn=True, ff=False)¶
-
rake_distortionless_filters
(source, interferer, R_n, delay=0.03, epsilon=0.005)¶ Compute time-domain filters of a beamformer minimizing noise and interference while forcing a distortionless response towards the source.
-
rake_max_sinr_filters
(source, interferer, R_n, epsilon=0.005, delay=0.0)¶ Compute the time-domain filters of SINR maximizing beamformer.
-
rake_max_sinr_weights
(source, interferer=None, R_n=None, rcond=0.0, ff=False, attn=True)¶ This method computes a beamformer focusing on a number of specific sources and ignoring a number of interferers.
- INPUTS
- source : source locations
- interferer : interferer locations
-
rake_max_udr_filters
(source, interferer=None, R_n=None, delay=0.03, epsilon=0.005)¶ Compute directly the time-domain filters maximizing the Useful-to-Detrimental Ratio (UDR).
This beamformer is not practical. It maximizes the UDR ratio in the time domain directly without imposing flat response towards the source of interest. This results in severe distortion of the desired signal.
Parameters: - source (pyroomacoustics.SoundSource) – the desired source
- interferer (pyroomacoustics.SoundSource, optional) – the interfering source
- R_n (ndarray, optional) – the noise covariance matrix, it should be (M * Lg)x(M * Lg) where M is the number of sensors and Lg the filter length
- delay (float, optional) – the signal delay introduced by the beamformer (default 0.03 s)
- epsilon (float) –
-
rake_max_udr_weights
(source, interferer=None, R_n=None, ff=False, attn=True)¶
-
rake_mvdr_filters
(source, interferer, R_n, delay=0.03, epsilon=0.005)¶ Compute the time-domain filters of the minimum variance distortionless response beamformer.
-
rake_one_forcing_filters
(sources, interferers, R_n, epsilon=0.005)¶ Compute the time-domain filters of a beamformer with unit response towards multiple sources.
-
rake_one_forcing_weights
(source, interferer=None, R_n=None, ff=False, attn=True)¶
-
rake_perceptual_filters
(source, interferer=None, R_n=None, delay=0.03, d_relax=0.035, epsilon=0.005)¶ Compute directly the time-domain filters for a perceptually motivated beamformer. The beamformer minimizes noise and interference, but relaxes the response of the filter within the 30 ms following the delay.
-
response
(phi_list, frequency)¶
-
response_from_point
(x, frequency)¶
-
snr
(source, interferer, f, R_n=None, dB=False)¶
-
steering_vector_2D
(frequency, phi, dist, attn=False)¶
-
steering_vector_2D_from_point
(frequency, source, attn=True, ff=False)¶ Creates a steering vector for a particular frequency and source
Parameters: - frequency –
- source – location in cartesian coordinates
- attn – include attenuation factor if True
- ff – uses far-field distance if true
Returns: A 2x1 ndarray containing the steering vector.
-
udr
(source, interferer, f, R_n=None, dB=False)¶
-
weights_from_filters
()¶
-
pyroomacoustics.beamforming.
H
(A, **kwargs)¶ Returns the conjugate (Hermitian) transpose of a matrix.
-
class
pyroomacoustics.beamforming.
MicrophoneArray
(R, fs)¶ Bases:
object
Microphone array class.
-
M
¶
-
append
(locs)¶ Add some microphones to the array
Parameters: locs (numpy.ndarray (2 or 3, n_mics)) – Adds n_mics microphones to the array. The coordinates are passed as a numpy.ndarray with each column containing the coordinates of a microphone.
-
record
(signals, fs)¶ This simulates the recording of the signals by the microphones. In particular, if the microphones and the room simulation do not use the same sampling frequency, down/up-sampling is done here.
Parameters: - signals – An ndarray with as many lines as there are microphones.
- fs – the sampling frequency of the signals.
-
to_wav
(filename, mono=False, norm=False, bitdepth=<MagicMock id='140703987448632'>)¶ Save all the signals to wav files.
Parameters: - filename (str) – the name of the file
- mono (bool, optional) – if true, records only the center channel floor(M / 2) (default False)
- norm (bool, optional) – if true, normalize the signal to fit in the dynamic range (default False)
- bitdepth (int, optional) – the format of output samples [np.int8/16/32/64 or np.float (default)]
-
-
pyroomacoustics.beamforming.
circular_2D_array
(center, M, phi0, radius)¶ Creates an array of uniformly spaced circular points in 2D
Parameters: - center (array_like) – The center of the array
- M (int) – The number of points
- phi0 (float) – The counterclockwise rotation of the first element in the array (from the x-axis)
- radius (float) – The radius of the array
Returns: The array of points
Return type: ndarray (2, M)
-
pyroomacoustics.beamforming.
distance
(x, y)¶ Computes the distance matrix E.
E[i,j] = sqrt(sum((x[:,i]-y[:,j])**2)). x and y are DxN ndarray containing N D-dimensional vectors.
-
pyroomacoustics.beamforming.
fir_approximation_ls
(weights, T, n1, n2)¶
-
pyroomacoustics.beamforming.
linear_2D_array
(center, M, phi, d)¶ Creates an array of uniformly spaced linear points in 2D
Parameters: - center (array_like) – The center of the array
- M (int) – The number of points
- phi (float) – The counterclockwise rotation of the array (from the x-axis)
- d (float) – The distance between neighboring points
Returns: The array of points
Return type: ndarray (2, M)
-
pyroomacoustics.beamforming.
mdot
(*args)¶ Left-to-right associative matrix multiplication of multiple 2D ndarrays.
-
pyroomacoustics.beamforming.
poisson_2D_array
(center, M, d)¶ Create array of 2D positions drawn from Poisson process.
Parameters: - center (array_like) – The center of the array
- M (int) – The number of points in the first dimension
- M – The number of points in the second dimension
- phi (float) – The counterclockwise rotation of the array (from the x-axis)
- d (float) – The distance between neighboring points
Returns: The array of points
Return type: ndarray (2, M * N)
-
pyroomacoustics.beamforming.
spiral_2D_array
(center, M, radius=1.0, divi=3, angle=None)¶ Generate an array of points placed on a spiral
Parameters: - center (array_like) – location of the center of the array
- M (int) – number of microphones
- radius (float) – microphones are contained within a cirle of this radius (default 1)
- divi (int) – number of rotations of the spiral (default 3)
- angle (float) – the angle offset of the spiral (default random)
Returns: The array of points
Return type: ndarray (2, M * N)
-
pyroomacoustics.beamforming.
square_2D_array
(center, M, N, phi, d)¶ Creates an array of uniformly spaced grid points in 2D
Parameters: - center (array_like) – The center of the array
- M (int) – The number of points in the first dimension
- M – The number of points in the second dimension
- phi (float) – The counterclockwise rotation of the array (from the x-axis)
- d (float) – The distance between neighboring points
Returns: The array of points
Return type: ndarray (2, M * N)
-
pyroomacoustics.beamforming.
sumcols
(A)¶ Sums the columns of a matrix (np.array).
The output is a 2D np.array of dimensions M x 1.
-
pyroomacoustics.beamforming.
unit_vec2D
(phi)¶
pyroomacoustics.build_rir module¶
pyroomacoustics.geometry module¶
pyroomacoustics.metrics module¶
-
pyroomacoustics.metrics.
itakura_saito
(x1, x2, sigma2_n, stft_L=128, stft_hop=128)¶
-
pyroomacoustics.metrics.
median
(x, alpha=None, axis=-1, keepdims=False)¶ Computes 95% confidence interval for the median.
Parameters: - x (array_like) – the data array
- alpha (float, optional) – the confidence level of the interval, confidence intervals are only computed when this argument is provided
- axis (int, optional) – the axis of the data on which to operate, by default the last axis
Returns: This function returns
(m, [le, ue])
and the confidence interval is[m-le, m+ue]
.Return type: tuple
(float, [float, float])
-
pyroomacoustics.metrics.
mse
(x1, x2)¶ A short hand to compute the mean-squared error of two signals.
\[MSE = \frac{1}{n}\sum_{i=0}^{n-1} (x_i - y_i)^2\]Parameters: - x1 – (ndarray)
- x2 – (ndarray)
Returns: (float) The mean of the squared differences of x1 and x2.
-
pyroomacoustics.metrics.
pesq
(ref_file, deg_files, Fs=8000, swap=False, wb=False, bin='./bin/pesq')¶ pesq_vals = pesq(ref_file, deg_files, sample_rate=None, bin=’./bin/pesq’): Computes the perceptual evaluation of speech quality (PESQ) metric of a degraded file with respect to a reference file. Uses the utility obtained from ITU P.862 http://www.itu.int/rec/T-REC-P.862-200511-I!Amd2/en
Parameters: - ref_file – The filename of the reference file.
- deg_files – A list of degraded sound files names.
- sample_rate – Sample rates of the sound files [8kHz or 16kHz, default 8kHz].
- swap – Swap byte orders (whatever that does is not clear to me) [default: False].
- wb – Use wideband algorithm [default: False].
- bin – Location of pesq executable [default: ./bin/pesq].
Returns: (ndarray size 2xN) ndarray containing Raw MOS and MOS LQO in rows 0 and 1, respectively, and has one column per degraded file name in deg_files.
-
pyroomacoustics.metrics.
snr
(ref, deg)¶
pyroomacoustics.multirate module¶
-
pyroomacoustics.multirate.
frac_delay
(delta, N, w_max=0.9, C=4)¶ Compute optimal fractionnal delay filter according to
Design of Fractional Delay Filters Using Convex Optimization William Putnam and Julius Smith
Parameters: - delta – delay of filter in (fractionnal) samples
- N – number of taps
- w_max – Bandwidth of the filter (in fraction of pi) (default 0.9)
- C – sets the number of constraints to C*N (default 4)
-
pyroomacoustics.multirate.
low_pass
(numtaps, B, epsilon=0.1)¶
-
pyroomacoustics.multirate.
resample
(x, p, q)¶
pyroomacoustics.parameters module¶
- This file defines the main physical constants of the system:
- Speed of sound
- Absorption of materials
- Scattering coefficients
- Air absorption
-
class
pyroomacoustics.parameters.
Constants
¶ Bases:
object
A class to provide easy access package wide to user settable constants.
-
get
(name)¶
-
set
(name, val)¶
-
-
class
pyroomacoustics.parameters.
Material
(energy_absorption, scattering=None)¶ Bases:
object
A class that describes the energy absorption and scattering properties of walls.
-
energy_absorption
¶ A dictionary containing keys
description
,coeffs
, andcenter_freqs
.Type: dict
-
scattering
¶ A dictionary containing keys
description
,coeffs
, andcenter_freqs
.Type: dict
Parameters: - energy_absorption (float, str, or dict) –
- float: The material created will be equally absorbing at all frequencies
- (i.e. flat).
- str: The absorption values will be obtained from the database.
- dict: A dictionary containing keys
description
,coeffs
, and center_freqs
.
- dict: A dictionary containing keys
- scattering (float, str, or dict) –
- float: The material created will be equally scattering at all frequencies
- (i.e. flat).
- str: The scattering values will be obtained from the database.
- dict: A dictionary containing keys
description
,coeffs
, and center_freqs
.
- dict: A dictionary containing keys
-
absorption_coeffs
¶ shorthand to the energy absorption coefficients
-
classmethod
all_flat
(materials)¶ Checks if all materials in a list are frequency flat
Parameters: materials (list or dict of Material objects) – The list of materials to check Returns: Return type: True
if all materials have a single parameter, elseFalse
-
is_freq_flat
()¶ Returns
True
if the material has flat characteristics over frequency,False
otherwise.
-
resample
(octave_bands)¶ resample at given octave bands
-
scattering_coeffs
¶ shorthand to the scattering coefficients
-
-
class
pyroomacoustics.parameters.
Physics
(temperature=None, humidity=None)¶ Bases:
object
A Physics object allows to compute the room physical properties depending on temperature and humidity.
Parameters: - temperature (float, optional) – The room temperature
- humidity (float in range (0, 100), optional) – The room relative humidity in %. Default is 0.
-
classmethod
from_speed
(c)¶ Choose a temperature and humidity matching a desired speed of sound
-
get_air_absorption
()¶ Returns: (air_absorption, center_freqs)
whereair_absorption
is a list- corresponding to the center frequencies in
center_freqs
-
get_sound_speed
()¶ Returns: Return type: the speed of sound
-
pyroomacoustics.parameters.
calculate_speed_of_sound
(t, h, p)¶ Compute the speed of sound as a function of temperature, humidity and pressure
Parameters: - t (float) – temperature [Celsius]
- h (float) – relative humidity [%]
- p (float) – atmospheric pressure [kpa]
Returns: Return type: Speed of sound in [m/s]
-
pyroomacoustics.parameters.
make_materials
(*args, **kwargs)¶ Helper method to conveniently create multiple materials.
Each positional and keyword argument should be a valid input for the Material class. Then, for each of the argument, a Material will be created by calling the constructor.
If at least one positional argument is provided, a list of Material objects constructed using the provided positional arguments is returned.
If at least one keyword argument is provided, a dict with keys corresponding to the keywords and containing Material objects constructed with the keyword values is returned.
If only positional arguments are provided, only the list is returned. If only keyword arguments are provided, only the dict is returned. If both are provided, both are returned. If no argument is provided, an empty list is returned.
pyroomacoustics.recognition module¶
-
class
pyroomacoustics.recognition.
CircularGaussianEmission
(nstates, odim=1, examples=None)¶ Bases:
object
-
get_pdfs
()¶ Return the pdf of all the emission probabilities
-
prob_x_given_state
(examples)¶ Recompute the probability of the observation given the state of the latent variables
-
update_parameters
(examples, gamma)¶
-
-
class
pyroomacoustics.recognition.
GaussianEmission
(nstates, odim=1, examples=None)¶ Bases:
object
-
get_pdfs
()¶ Return the pdf of all the emission probabilities
-
prob_x_given_state
(examples)¶ Recompute the probability of the observation given the state of the latent variables
-
update_parameters
(examples, gamma)¶
-
-
class
pyroomacoustics.recognition.
HMM
(nstates, emission, model='full', leftright_jump_max=3)¶ Bases:
object
Hidden Markov Model with Gaussian emissions
-
K
¶ Number of states in the model
Type: int
-
O
¶ Number of dimensions of the Gaussian emission distribution
Type: int
-
A
¶ KxK transition matrix of the Markov chain
Type: ndarray
-
pi
¶ K dim vector of the initial probabilities of the Markov chain
Type: ndarray
-
emission
¶ An instance of emission_class
Type: (GaussianEmission or CircularGaussianEmission)
-
model
¶ The model used for the chain, can be ‘full’ or ‘left-right’
Type: string, optional
-
leftright_jum_max
¶ The number of non-zero upper diagonals in a ‘left-right’ model
Type: int, optional
-
backward
(X, p_x_given_z, c)¶ The backward recursion for HMM as described in Bishop Ch. 13
-
fit
(examples, tol=0.1, max_iter=10, verbose=False)¶ Training of the HMM using the EM algorithm
Parameters: - examples ((list)) – A list of examples used to train the model. Each example is an array of feature vectors, each row is a feature vector, the sequence runs on axis 0
- tol ((float)) – The training stops when the progress between to steps is less than this number (default 0.1)
- max_iter ((int)) – Alternatively the algorithm stops when a maximum number of iterations is reached (default 10)
- verbose (bool, optional) – When True, prints extra information about convergence
-
forward
(X, p_x_given_z)¶ The forward recursion for HMM as described in Bishop Ch. 13
-
generate
(N)¶ Generate a random sample of length N using the model
-
loglikelihood
(X)¶ Compute the log-likelihood of a sample vector using the sum-product algorithm
-
update_parameters
(examples, gamma, xhi)¶ Update the parameters of the Markov Chain
-
viterbi
()¶
-
pyroomacoustics.soundsource module¶
-
class
pyroomacoustics.soundsource.
SoundSource
(position, images=None, damping=None, generators=None, walls=None, orders=None, signal=None, delay=0)¶ Bases:
object
A class to represent sound sources.
This object represents a sound source in a room by a list containing the original source position as well as all the image sources, up to some maximum order.
It also keeps track of the sequence of generated images and the index of the walls (in the original room) that generated the reflection.
-
add_signal
(signal)¶ Sets
SoundSource.signal
attribute
-
distance
(ref_point)¶
-
get_damping
(max_order=None)¶
-
get_images
(max_order=None, max_distance=None, n_nearest=None, ref_point=None)¶ Keep this for compatibility Now replaced by the bracket operator and the setOrdering function.
-
get_rir
(mic, visibility, Fs, t0=0.0, t_max=None)¶ Compute the room impulse response between the source and the microphone whose position is given as an argument.
-
set_ordering
(ordering, ref_point=None)¶ Set the order in which we retrieve images sources. Can be: ‘nearest’, ‘strongest’, ‘order’ Optional argument: ref_point
-
wall_sequence
(i)¶ Print the wall sequence for the image source indexed by i
-
-
pyroomacoustics.soundsource.
build_rir_matrix
(mics, sources, Lg, Fs, epsilon=0.005, unit_damping=False)¶ A function to build the channel matrix for many sources and microphones
Parameters: - mics (ndarray) – a dim-by-M ndarray where each column is the position of a microphone
- sources (list of pyroomacoustics.SoundSource) – list of sound sources for which we want to build the matrix
- Lg (int) – the length of the beamforming filters
- Fs (int) – the sampling frequency
- epsilon (float, optional) – minimum decay of the sinc before truncation. Defaults to epsilon=5e-3
- unit_damping (bool, optional) – determines if the wall damping parameters are used or not. Default to false.
Returns: - the function returns the RIR matrix H =
- :: – ——————– | H_{11} H_{12} … | … | ——————–
- where H_{ij} is channel matrix between microphone i and source j.
- H is of type (M*Lg)x((Lg+Lh-1)*S) where Lh is the channel length (determined by epsilon),
- and M, S are the number of microphones, sources, respectively.
pyroomacoustics.stft module¶
Class for real-time STFT analysis and processing.
pyroomacoustics.sync module¶
-
pyroomacoustics.sync.
correlate
(x1, x2, interp=1, phat=False)¶ Compute the cross-correlation between x1 and x2
Parameters: - x1,x2 (array_like) – The data arrays
- interp (int, optional) – The interpolation factor for the output array, default 1.
- phat (bool, optional) – Apply the PHAT weighting (default False)
Returns: Return type: The cross-correlation between the two arrays
-
pyroomacoustics.sync.
delay_estimation
(x1, x2, L)¶ Estimate the delay between x1 and x2. L is the block length used for phat
-
pyroomacoustics.sync.
tdoa
(signal, reference, interp=1, phat=False, fs=1, t_max=None)¶ Estimates the shift of array signal with respect to reference using generalized cross-correlation
Parameters: - signal (array_like) – The array whose tdoa is measured
- reference (array_like) – The reference array
- interp (int, optional) – The interpolation factor for the output array, default 1.
- phat (bool, optional) – Apply the PHAT weighting (default False)
- fs (int or float, optional) – The sampling frequency of the input arrays, default=1
Returns: Return type: The estimated delay between the two arrays
-
pyroomacoustics.sync.
time_align
(ref, deg, L=4096)¶ return a copy of deg time-aligned and of same-length as ref. L is the block length used for correlations.
pyroomacoustics.utilities module¶
-
pyroomacoustics.utilities.
angle_from_points
(x1, x2)¶
-
pyroomacoustics.utilities.
autocorr
(x, p, biased=True, method='numpy')¶ Compute the autocorrelation for real signal x up to lag p.
Parameters: - x (numpy array) – Real signal in time domain.
- p (int) – Amount of lag. When solving for LPC coefficient, this is typically the LPC order.
- biased (bool) – Whether to return biased autocorrelation (default) or unbiased. As there are fewer samples for larger lags, the biased estimate tends to have better statistical stability under noise.
- method (‘numpy, ‘fft’, ‘time’, pra) – Method for computing the autocorrelation: in the frequency domain with fft or pra (np.fft.rfft is used so only real signals are supported), in the time domain with time, or with numpy’s built-in function np.correlate (default). For p < log2(len(x)), the time domain approach may be more efficient.
Returns: Autocorrelation for x up to lag p.
Return type: numpy array
-
pyroomacoustics.utilities.
clip
(signal, high, low)¶ Clip a signal from above at high and from below at low.
-
pyroomacoustics.utilities.
compare_plot
(signal1, signal2, Fs, fft_size=512, norm=False, equal=False, title1=None, title2=None)¶
-
pyroomacoustics.utilities.
convmtx
(x, n)¶ Create a convolution matrix H for the vector x of size len(x) times n. Then, the result of np.dot(H,v) where v is a vector of length n is the same as np.convolve(x, v).
-
pyroomacoustics.utilities.
create_noisy_signal
(signal_fp, snr, noise_fp=None, offset=None)¶ Create a noisy signal of a specified SNR. :param signal_fp: File path to clean input. :type signal_fp: string :param snr: SNR in dB. :type snr: float :param noise_fp: File path to noise. Default is to use randomly generated white noise. :type noise_fp: string :param offset: Offset in seconds before starting the signal. :type offset: float
Returns: - numpy array – Noisy signal with specified SNR, between [-1,1] and zero mean.
- numpy array – Clean signal, untouched from WAV file.
- numpy array – Added noise such that specified SNR is met.
- int – Sampling rate in Hz.
-
pyroomacoustics.utilities.
dB
(signal, power=False)¶
-
pyroomacoustics.utilities.
fractional_delay
(t0)¶ Creates a fractional delay filter using a windowed sinc function. The length of the filter is fixed by the module wide constant frac_delay_length (default 81).
Parameters: t0 (float) – The delay in fraction of sample. Typically between -1 and 1. Returns: A fractional delay filter with specified delay. Return type: numpy array
-
pyroomacoustics.utilities.
fractional_delay_filter_bank
(delays)¶ Creates a fractional delay filter bank of windowed sinc filters
Parameters: delays (1d narray) – The delays corresponding to each filter in fractional samples Returns: An ndarray where the ith row contains the fractional delay filter corresponding to the ith delay. The number of columns of the matrix is proportional to the maximum delay. Return type: numpy array
-
pyroomacoustics.utilities.
goertzel
(x, k)¶ Goertzel algorithm to compute DFT coefficients
-
pyroomacoustics.utilities.
highpass
(signal, Fs, fc=None, plot=False)¶ Filter out the really low frequencies, default is below 50Hz
-
pyroomacoustics.utilities.
levinson
(r, b)¶ Solve a system of the form Rx=b where R is hermitian toeplitz matrix and b is any vector using the generalized Levinson recursion as described in M.H. Hayes, Statistical Signal Processing and Modelling, p. 268.
Parameters: - r – First column of R, toeplitz hermitian matrix.
- b – The right-hand argument. If b is a matrix, the system is solved for every column vector in b.
Returns: The solution of the linear system Rx = b.
Return type: numpy array
-
pyroomacoustics.utilities.
low_pass_dirac
(t0, alpha, Fs, N)¶ Creates a vector containing a lowpass Dirac of duration T sampled at Fs with delay t0 and attenuation alpha.
If t0 and alpha are 2D column vectors of the same size, then the function returns a matrix with each line corresponding to pair of t0/alpha values.
-
pyroomacoustics.utilities.
lpc
(x, p, biased=True)¶ Compute p LPC coefficients for a speech segment x.
Parameters: - x (numpy array) – Real signal in time domain.
- p (int) – Amount of lag. When solving for LPC coefficient, this is typically the LPC order.
- biased (bool) – Whether to use biased autocorrelation (default) or unbiased. As there are fewer samples for larger lags, the biased estimate tends to have better statistical stability under noise.
Returns: p LPC coefficients.
Return type: numpy array
-
pyroomacoustics.utilities.
normalize
(signal, bits=None)¶ normalize to be in a given range. The default is to normalize the maximum amplitude to be one. An optional argument allows to normalize the signal to be within the range of a given signed integer representation of bits.
-
pyroomacoustics.utilities.
normalize_pwr
(sig1, sig2)¶ Normalize sig1 to have the same power as sig2.
-
pyroomacoustics.utilities.
prony
(x, p, q)¶ Prony’s Method from Monson H. Hayes’ Statistical Signal Processing, p. 154
Parameters: - x – signal to model
- p – order of denominator
- q – order of numerator
Returns: - a – numerator coefficients
- b – denominator coefficients
- err (the squared error of approximation)
-
pyroomacoustics.utilities.
real_spectrum
(signal, axis=-1, **kwargs)¶
-
pyroomacoustics.utilities.
rms
(data)¶ Compute root mean square of input.
Parameters: data (numpy array) – Real signal in time domain. Returns: Root mean square. Return type: float
-
pyroomacoustics.utilities.
shanks
(x, p, q)¶ Shank’s Method from Monson H. Hayes’ Statistical Signal Processing, p. 154
Parameters: - x – signal to model
- p – order of denominator
- q – order of numerator
Returns: - a – numerator coefficients
- b – denominator coefficients
- err – the squared error of approximation
-
pyroomacoustics.utilities.
spectrum
(signal, Fs, N)¶
-
pyroomacoustics.utilities.
time_dB
(signal, Fs, bits=16)¶ Compute the signed dB amplitude of the oscillating signal normalized wrt the number of bits used for the signal.
-
pyroomacoustics.utilities.
to_16b
(signal)¶ converts float 32 bit signal (-1 to 1) to a signed 16 bits representation No clipping in performed, you are responsible to ensure signal is within the correct interval.
-
pyroomacoustics.utilities.
to_float32
(data)¶ Cast data (typically from WAV) to float32.
Parameters: data (numpy array) – Real signal in time domain, typically obtained from WAV file. Returns: data as float32. Return type: numpy array
pyroomacoustics.wall module¶
pyroomacoustics.windows module¶
Window Functions¶
This is a collection of many popular window functions used in signal processing.
A few options are provided to correctly construct the required window function.
The flag
keyword argument can take the following values.
asymmetric
- This way, many of the functions will sum to one when their left part is added to their right part. This is useful for overlapped transforms such as the STFT.
symmetric
- With this flag, the window is perfectly symmetric. This might be more suitable for analysis tasks.
mdct
- Available for only some of the windows. The window is modified to satisfy the perfect reconstruction condition of the MDCT transform.
Often, we would like to get the full window function, but on some occasions, it is useful
to get only the left (or right) part. This can be indicated via the keyword argument
length
that can take values full
(default), left
, or right
.
-
pyroomacoustics.windows.
bart
(N, flag='asymmetric', length='full')¶ The Bartlett window function
\[w[n] = 2 / (M-1) ((M-1)/2 - |n - (M-1)/2|) , n=0,\ldots,N-1\]Parameters: - N (int) – the window length
- flag (string, optional) –
Possible values
- asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
- symmetric: the window is symmetric (\(M=N-1\))
- mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
- length (string, optional) –
Possible values
- full: the full length window is computed
- right: the right half of the window is computed
- left: the left half of the window is computed
-
pyroomacoustics.windows.
bart_hann
(N, flag='asymmetric', length='full')¶ The modified Bartlett–Hann window function
\[w[n] = 0.62 - 0.48|(n/M-0.5)| + 0.38 \cos(2\pi(n/M-0.5)), n=0,\ldots,N-1\]Parameters: - N (int) – the window length
- flag (string, optional) –
Possible values
- asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
- symmetric: the window is symmetric (\(M=N-1\))
- mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
- length (string, optional) –
Possible values
- full: the full length window is computed
- right: the right half of the window is computed
- left: the left half of the window is computed
-
pyroomacoustics.windows.
blackman
(N, flag='asymmetric', length='full')¶ The Blackman window function
\[w[n] = 0.42 - 0.5\cos(2\pi n/(M-1)) + 0.08\cos(4\pi n/(M-1)), n = 0, \ldots, M-1\]Parameters: - N (int) – the window length
- flag (string, optional) –
Possible values
- asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
- symmetric: the window is symmetric (\(M=N-1\))
- mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
- length (string, optional) –
Possible values
- full: the full length window is computed
- right: the right half of the window is computed
- left: the left half of the window is computed
-
pyroomacoustics.windows.
blackman_harris
(N, flag='asymmetric', length='full')¶ The Hann window function
\[w[n] = a_0 - a_1 \cos(2\pi n/M) + a_2 \cos(4\pi n/M) + a_3 \cos(6\pi n/M), n=0,\ldots,N-1\]Parameters: - N (int) – the window length
- flag (string, optional) –
Possible values
- asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
- symmetric: the window is symmetric (\(M=N-1\))
- length (string, optional) –
Possible values
- full: the full length window is computed
- right: the right half of the window is computed
- left: the left half of the window is computed
-
pyroomacoustics.windows.
bohman
(N, flag='asymmetric', length='full')¶ The Bohman window function
\[w[n] = (1-|x|) \cos(\pi |x|) + \pi / |x| \sin(\pi |x|), -1\leq x\leq 1\]Parameters: - N (int) – the window length
- flag (string, optional) –
Possible values
- asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
- symmetric: the window is symmetric (\(M=N-1\))
- mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
- length (string, optional) –
Possible values
- full: the full length window is computed
- right: the right half of the window is computed
- left: the left half of the window is computed
-
pyroomacoustics.windows.
cosine
(N, flag='asymmetric', length='full')¶ The cosine window function
\[w[n] = \cos(\pi (n/M - 0.5))^2\]Parameters: - N (int) – the window length
- flag (string, optional) –
Possible values
- asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
- symmetric: the window is symmetric (\(M=N-1\))
- mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
- length (string, optional) –
Possible values
- full: the full length window is computed
- right: the right half of the window is computed
- left: the left half of the window is computed
-
pyroomacoustics.windows.
flattop
(N, flag='asymmetric', length='full')¶ The flat top weighted window function
\[w[n] = a_0 - a_1 \cos(2\pi n/M) + a_2 \cos(4\pi n/M) + a_3 \cos(6\pi n/M) + a_4 \cos(8\pi n/M), n=0,\ldots,N-1\]where
\[a0 = 0.21557895 a1 = 0.41663158 a2 = 0.277263158 a3 = 0.083578947 a4 = 0.006947368\]Parameters: - N (int) – the window length
- flag (string, optional) –
Possible values
- asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
- symmetric: the window is symmetric (\(M=N-1\))
- length (string, optional) –
Possible values
- full: the full length window is computed
- right: the right half of the window is computed
- left: the left half of the window is computed
-
pyroomacoustics.windows.
gaussian
(N, std, flag='asymmetric', length='full')¶ The flat top weighted window function
\[w[n] = e^{ -\frac{1}{2}\left(\frac{n}{\sigma}\right)^2 }\]Parameters: - N (int) – the window length
- std (float) – the standard deviation
- flag (string, optional) –
Possible values
- asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
- symmetric: the window is symmetric (\(M=N-1\))
- length (string, optional) –
Possible values
- full: the full length window is computed
- right: the right half of the window is computed
- left: the left half of the window is computed
-
pyroomacoustics.windows.
hamming
(N, flag='asymmetric', length='full')¶ The Hamming window function
\[w[n] = 0.54 - 0.46 \cos(2 \pi n / M), n=0,\ldots,N-1\]Parameters: - N (int) – the window length
- flag (string, optional) –
Possible values
- asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
- symmetric: the window is symmetric (\(M=N-1\))
- mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
- length (string, optional) –
Possible values
- full: the full length window is computed
- right: the right half of the window is computed
- left: the left half of the window is computed
-
pyroomacoustics.windows.
hann
(N, flag='asymmetric', length='full')¶ The Hann window function
\[w[n] = 0.5 (1 - \cos(2 \pi n / M)), n=0,\ldots,N-1\]Parameters: - N (int) – the window length
- flag (string, optional) –
Possible values
- asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
- symmetric: the window is symmetric (\(M=N-1\))
- mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
- length (string, optional) –
Possible values
- full: the full length window is computed
- right: the right half of the window is computed
- left: the left half of the window is computed
-
pyroomacoustics.windows.
kaiser
(N, beta, flag='asymmetric', length='full')¶ The Kaiser window function
\[w[n] = I_0\left( \beta \sqrt{1-\frac{4n^2}{(M-1)^2}} \right)/I_0(\beta)\]with
\[\quad -\frac{M-1}{2} \leq n \leq \frac{M-1}{2},\]where \(I_0\) is the modified zeroth-order Bessel function.
Parameters: - N (int) – the window length
- beta (float) – Shape parameter, determines trade-off between main-lobe width and side lobe level. As beta gets large, the window narrows.
- flag (string, optional) –
Possible values
- asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
- symmetric: the window is symmetric (\(M=N-1\))
- mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
- length (string, optional) –
Possible values
- full: the full length window is computed
- right: the right half of the window is computed
- left: the left half of the window is computed
-
pyroomacoustics.windows.
rect
(N)¶ The rectangular window
\[w[n] = 1, n=0,\ldots,N-1\]Parameters: N (int) – the window length
-
pyroomacoustics.windows.
triang
(N, flag='asymmetric', length='full')¶ The triangular window function
\[w[n] = 1 - | 2 n / M - 1 |, n=0,\ldots,N-1\]Parameters: - N (int) – the window length
- flag (string, optional) –
Possible values
- asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
- symmetric: the window is symmetric (\(M=N-1\))
- mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
- length (string, optional) –
Possible values
- full: the full length window is computed
- right: the right half of the window is computed
- left: the left half of the window is computed
Module contents¶
pyroomacoustics¶
- Provides
- Room impulse simulations via the image source model
- Simulation of sound propagation using STFT engine
- Reference implementations of popular algorithms for
- beamforming
- direction of arrival
- adaptive filtering
- source separation
- single channel denoising
- etc
How to use the documentation¶
Documentation is available in two forms: docstrings provided with the code, and a loose standing reference guide, available from the pyroomacoustics readthedocs page.
We recommend exploring the docstrings using IPython, an advanced Python shell with TAB-completion and introspection capabilities. See below for further instructions.
The docstring examples assume that pyroomacoustics has been imported as pra:
>>> import pyroomacoustics as pra
Code snippets are indicated by three greater-than signs:
>>> x = 42
>>> x = x + 1
Use the built-in help
function to view a function’s docstring:
>>> help(pra.stft.STFT)
...
Available submodules¶
pyroomacoustics.acoustics
- Acoustics and psychoacoustics routines, mel-scale, critcal bands, etc.
pyroomacoustics.beamforming
- Microphone arrays and beamforming routines.
pyroomacoustics.geometry
- Core geometry routine for the image source model.
pyroomacoustics.metrics
- Performance metrics like mean-squared error, median, Itakura-Saito, etc.
pyroomacoustics.multirate
- Rate conversion routines.
pyroomacoustics.parameters
- Global parameters, i.e. for physics constants.
pyroomacoustics.recognition
- Hidden Markov Model and TIMIT database structure.
pyroomacoustics.room
- Abstraction of room and image source model.
pyroomacoustics.soundsource
- Abstraction for a sound source.
pyroomacoustics.stft
- Deprecated Replaced by the methods in
pyroomacoustics.transform
pyroomacoustics.sync
- A few routines to help synchronize signals.
pyroomacoustics.utilities
- A bunch of routines to do various stuff.
pyroomacoustics.wall
- Abstraction for walls of a room.
pyroomacoustics.windows
- Tapering windows for spectral analysis.
Available subpackages¶
pyroomacoustics.adaptive
- Adaptive filter algorithms
pyroomacoustics.bss
- Blind source separation.
pyroomacoustics.datasets
- Wrappers around a few popular speech datasets
pyroomacoustics.denoise
- Single channel noise reduction methods
pyroomacoustics.doa
- Direction of arrival finding algorithms
pyroomacoustics.phase
- Phase-related processing
pyroomacoustics.transform
- Block frequency domain processing tools
Utilities¶
- __version__
- pyroomacoustics version string