
Summary¶
Pyroomacoustics is a software package aimed at the rapid development and testing of audio array processing algorithms. The content of the package can be divided into three main components:
Intuitive Python object-oriented interface to quickly construct different simulation scenarios involving multiple sound sources and microphones in 2D and 3D rooms;
Fast C++ implementation of the image source model and ray tracing for general polyhedral rooms to efficiently generate room impulse responses and simulate the propagation between sources and receivers;
Reference implementations of popular algorithms for STFT, beamforming, direction finding, adaptive filtering, source separation, and single channel denoising.
Together, these components form a package with the potential to speed up the time to market of new algorithms by significantly reducing the implementation overhead in the performance evaluation step. Please refer to this notebook for a demonstration of the different components of this package.
Room Acoustics Simulation¶
Consider the following scenario.
Suppose, for example, you wanted to produce a radio crime drama, and it so happens that, according to the scriptwriter, the story line absolutely must culminate in a satanic mass that quickly degenerates into a violent shootout, all taking place right around the altar of the highly reverberant acoustic environment of Oxford’s Christ Church cathedral. To ensure that it sounds authentic, you asked the Dean of Christ Church for permission to record the final scene inside the cathedral, but somehow he fails to be convinced of the artistic merit of your production, and declines to give you permission. But recorded in a conventional studio, the scene sounds flat. So what do you do?
—Schnupp, Nelken, and King, Auditory Neuroscience, 2010
Faced with this difficult situation, pyroomacoustics can save the day by simulating the environment of the Christ Church cathedral!
At the core of the package is a room impulse response (RIR) generator based on the image source model that can handle
Convex and non-convex rooms
2D/3D rooms
The core image source model and ray tracing modules are written in C++ for better performance.
The philosophy of the package is to abstract all necessary elements of an experiment using an object-oriented programming approach. Each of these elements is represented using a class and an experiment can be designed by combining these elements just as one would do in a real experiment.
Let’s imagine we want to simulate a delay-and-sum beamformer that uses a linear array with four microphones in a shoe box shaped room that contains only one source of sound. First, we create a room object, to which we add a microphone array object, and a sound source object. Then, the room object has methods to compute the RIR between source and receiver. The beamformer object then extends the microphone array class and has different methods to compute the weights, for example delay-and-sum weights. See the example below to get an idea of what the code looks like.
The Room class also allows one to process sound samples emitted by sources, effectively simulating the propagation of sound between sources and microphones. At the input of the microphones composing the beamformer, an STFT (short time Fourier transform) engine allows to quickly process the signals through the beamformer and evaluate the output.
Reference Implementations¶
In addition to its core image source model simulation, pyroomacoustics also contains a number of reference implementations of popular audio processing algorithms for
Short time Fourier transform (block + online)
direction of arrival (DOA) finding
adaptive filtering (NLMS, RLS)
blind source separation (AuxIVA, Trinicon, ILRMA, SparseAuxIVA, FastMNMF, FastMNMF2)
single channel denoising (Spectral Subtraction, Subspace, Iterative Wiener)
We use an object-oriented approach to abstract the details of specific algorithms, making them easy to compare. Each algorithm can be tuned through optional parameters. We have tried to pre-set values for the tuning parameters so that a run with the default values will in general produce reasonable results.
Datasets¶
In an effort to simplify the use of datasets, we provide a few wrappers that allow to quickly load and sort through some popular speech corpora. At the moment we support the following.
For more details, see the doc.
Quick Install¶
Install the package with pip:
pip install pyroomacoustics
A cookiecutter is available that generates a working simulation script for a few 2D/3D scenarios:
# if necessary install cookiecutter
pip install cookiecutter
# create the simulation script
cookiecutter gh:fakufaku/cookiecutter-pyroomacoustics-sim
# run the newly created script
python <chosen_script_name>.py
We have also provided a minimal Dockerfile example in order to install and run the package within a Docker container. Note that you should increase the memory of your containers to 4 GB. Less may also be sufficient, but this is necessary for building the C++ code extension. You can build the container with:
docker build -t pyroom_container .
And enter the container with:
docker run -it pyroom_container:latest /bin/bash
Dependencies¶
The minimal dependencies are:
numpy
scipy>=0.18.0
Cython
pybind11
where Cython
is only needed to benefit from the compiled accelerated simulator.
The simulator itself has a pure Python counterpart, so that this requirement could
be ignored, but is much slower.
On top of that, some functionalities of the package depend on extra packages:
samplerate # for resampling signals
matplotlib # to create graphs and plots
sounddevice # to play sound samples
mir_eval # to evaluate performance of source separation in examples
The requirements.txt
file lists all packages necessary to run all of the
scripts in the examples
folder.
This package is mainly developed under Python 3.6. The last supported version for Python 2.7 is
0.4.3
.
Under Linux and Mac OS, the compiled accelerators require a valid compiler to be installed, typically this is GCC. When no compiler is present, the package will still install but default to the pure Python implementation which is much slower. On Windows, we provide pre-compiled Python Wheels for Python 3.5 and 3.6.
Example¶
Here is a quick example of how to create and visualize the response of a beamformer in a room.
import numpy as np
import matplotlib.pyplot as plt
import pyroomacoustics as pra
# Create a 4 by 6 metres shoe box room
room = pra.ShoeBox([4,6])
# Add a source somewhere in the room
room.add_source([2.5, 4.5])
# Create a linear array beamformer with 4 microphones
# with angle 0 degrees and inter mic distance 10 cm
R = pra.linear_2D_array([2, 1.5], 4, 0, 0.1)
room.add_microphone_array(pra.Beamformer(R, room.fs))
# Now compute the delay and sum weights for the beamformer
room.mic_array.rake_delay_and_sum_weights(room.sources[0][:1])
# plot the room and resulting beamformer
room.plot(freq=[1000, 2000, 4000, 8000], img_order=0)
plt.show()
More examples¶
A couple of detailed demos with illustrations are available.
A comprehensive set of examples covering most of the functionalities
of the package can be found in the examples
folder of the GitHub
repository.
How to contribute¶
If you would like to contribute, please clone the repository and send a pull request.
For more details, see our CONTRIBUTING page.
Academic publications¶
This package was developed to support academic publications. The package contains implementations for DOA algorithms and acoustic beamformers introduced in the following papers.
H. Pan, R. Scheibler, I. Dokmanic, E. Bezzam and M. Vetterli. FRIDA: FRI-based DOA estimation for arbitrary array layout, ICASSP 2017, New Orleans, USA, 2017.
I. Dokmanić, R. Scheibler and M. Vetterli. Raking the Cocktail Party, in IEEE Journal of Selected Topics in Signal Processing, vol. 9, num. 5, p. 825 - 836, 2015.
R. Scheibler, I. Dokmanić and M. Vetterli. Raking Echoes in the Time Domain, ICASSP 2015, Brisbane, Australia, 2015.
If you use this package in your own research, please cite our paper describing it.
R. Scheibler, E. Bezzam, I. Dokmanić, Pyroomacoustics: A Python package for audio room simulations and array processing algorithms, Proc. IEEE ICASSP, Calgary, CA, 2018.
License¶
Copyright (c) 2014-2021 EPFL-LCAV
Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
Table of contents¶
Contributing¶
If you want to contribute to pyroomacoustics
and make it better,
your help is very welcome. Contributing is also a great way to learn
more about the package itself.
Ways to contribute¶
File bug reports
Improvements to the documentation are always more than welcome. Keeping a good clean documentation is a challenging task and any help is appreciated.
Feature requests
If you implemented an extra DOA/adaptive filter/beamforming algorithm: that’s awesome! We’d love to add it to the package.
Suggestion of improvements to the code base are also welcome.
Coding style¶
We try to stick to PEP8 as much as possible. Variables, functions, modules and packages should be in lowercase with underscores. Class names in CamelCase.
We use Black to format the code. The format will be automatically checked when doing a pull request so it is recommended to regularly run Black on the code.
Documentation¶
Docstrings should follow the numpydoc style.
We recommend the following steps for generating the documentation:
Create a separate environment, e.g. with Anaconda, as such:
conda create -n mkdocs37 python=3.7 sphinx numpydoc mock sphinx_rtd_theme sphinxcontrib-napoleon tabulate
Switch to the environment:
source activate mkdocs37
Navigate to the
docs
folder and run:./make_apidoc.sh
The materials database page is generated with the script
./make_materials_table.py
Build and view the documentation locally with:
make html
Open in your browser:
docs/_build/html/index.html
Develop Locally¶
It can be convenient to develop and run tests locally. In contrast to only using the package, you will then also need to compile the C++ extension for that. On Mac and Linux, GCC is required, while Visual C++ 14.0 is necessary for windows.
Get the source code. Use recursive close so that Eigen (a sub-module of this repository) is also downloaded.
git clone --recursive git@github.com:LCAV/pyroomacoustics.git
Alternatively, you can clone without the –recursive flag and directly install the Eigen library. For macOS, you can find installation instruction here: https://stackoverflow.com/a/35658421. After installation you can create a symbolic link as such:
ln -s PATH_TO_EIGEN pyroomacoustics/libroom_src/ext/eigen/Eigen
Install a few pre-requisites
pip install numpy Cython pybind11
Compile locally
python setup.py build_ext --inplace
On recent Mac OS (Mojave), it is necessary in some cases to add a higher deployment target
MACOSX_DEPLOYMENT_TARGET=10.9 python setup.py build_ext --inplace
Update
$PYTHONPATH
so that python knows where to find the local package# Linux/Mac export PYTHONPATH=<path_to_pyroomacoustics>:$PYTHONPATH
For windows, see this question on stackoverflow.
Install the dependencies listed in
requirements.txt
pip install -r requirements.txt
Now fire up
python
oripython
and check that the package can be importedimport pyroomacoustics as pra
Unit Tests¶
As much as possible, for every new function added to the code base, add
a short test script in pyroomacoustics/tests
. The names of the
script and the functions running the test should be prefixed by
test_
. The tests are started by running nosetests
at the root of
the package.
How to make a clean pull request¶
Look for a project’s contribution instructions. If there are any, follow them.
Create a personal fork of the project on Github.
Clone the fork on your local machine. Your remote repo on Github is called
origin
.Add the original repository as a remote called
upstream
.If you created your fork a while ago be sure to pull upstream changes into your local repository.
Create a new branch to work on! Branch from
develop
if it exists, else frommaster
.Implement/fix your feature, comment your code.
Follow the code style of the project, including indentation.
If the project has tests run them!
Write or adapt tests as needed.
Add or change the documentation as needed.
Squash your commits into a single commit with git’s interactive rebase. Create a new branch if necessary.
Push your branch to your fork on Github, the remote
origin
.From your fork open a pull request in the correct branch. Target the project’s
develop
branch if there is one, else go formaster
!…
If the maintainer requests further changes just push them to your branch. The PR will be updated automatically.
Once the pull request is approved and merged you can pull the changes from
upstream
to your local repo and delete your extra branch(es).
And last but not least: Always write your commit messages in the present tense. Your commit message should describe what the commit, when applied, does to the code – not what you did to the code.
How to deploy a new version to pypi¶
git checkout pypi-release
git merge master
Change version number in
pyroomacoustics/version.py
to new version number vX.Y.ZEdit
CHANGELOG.rst
as followsAdd new title
X.Y.Z_ - YEAR-MONTH-DAY
underUnreleased
, add “Nothing yet” in the unreleased section.Edit appropriately the lists of links at the bottom of the file.
git commit
git tag vX.Y.Z
git push origin vX.Y.Z
git push
git checkout master
git merge pypi-release
git push origin master
Reference¶
This guide is based on the nice template by @MarcDiethelm available under MIT License.
Changelog¶
All notable changes to pyroomacoustics will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
Unreleased¶
Nothing yet.
0.7.3 - 2022-12-05¶
Bugfix¶
Fixes issue #293 due to the C++ method
Room::next_wall_hit
not handling 2D shoebox rooms, which cause a seg fault
0.7.2 - 2022-11-15¶
Added¶
Appveyor builds for compiled wheels for win32/win64 x86
Bugfix¶
Fixes missing import statement in room.plot for 3D rooms (PR #286)
On win64,
bss.fastmnmf
would fail due to some singular matrix. 1) protect solve with try/except and switch to pseudo-inverse if necessary, 2) change eps 1e-7 -> 1e-6
0.7.1 - 2022-11-11¶
Bugfix¶
Fixed pypi upload for windows wheels
0.7.0 - 2022-11-10¶
Added¶
Added the AnechoicRoom class.
Added FastMNMF2 (Fast Multichannel Nonnegative Matrix Factorization 2) to
bss
subpackage.Randomized image source method for removing sweeping echoes in shoebox rooms.
Adds the
cart2spher
method inpyroomacoustics.doa.utils
to convert from cartesian to spherical coordinates.Example room_complex_wall_materials.py
CI for python 3.10
Changed¶
Cleans up the plot_rir function in Room so that the labels are neater. It also adds an extra option
kind
that can take values “ir”, “tf”, or “spec” to plot the impulse responses, transfer functions, or spectrograms of the RIR.Refactored the implementation of FastMNMF.
Modified the document of __init__.py in
doa
subpackage.Removed the deprecated
realtime
sub-module.Removed the deprecated functions
pyroomacoustics.transform.analysis
,pyroomacoustics.transform.synthesis
,pyroomacoustics.transform.compute_synthesis_window
. They are replaced by the equivalent functions inpyroomacoustics.transform.stft
sub-module.The minimum required version of numpy was changed to 1.13.0 (use of
np.linalg.multi_dot
indoa
sub-package see #271)
Bugfix¶
Fixed most warnings in the tests
Fixed bug in
examples/adaptive_filter_stft_domain.py
0.6.0 - 2021-11-29¶
Added¶
New DOA method: MUSIC with pseudo-spectra normalization. Thanks @4bian! Normalizes MUSIC pseudo spectra before averaging across frequency axis.
Bugfix¶
Issue 235: fails when set_ray_tracing is called, but no mic_array is set
Issue 236: general ISM produces the wrong transmission coefficients
Removes an unncessery warning for some rooms when ray tracing is not needed
Misc¶
Unify code format by using Black
Add code linting in continuous integration
Drop CI support for python 3.5
0.5.0 - 2021-09-06¶
Added¶
Adds tracking of reflection order with respect to x/y/z axis in the shoebox image source model engine. The orders are available in source.orders_xyz after running the image source model
Support for microphone and source directivites for image source model. Source directivities just for shoebox room. Available directivities are frequency-independent (cardioid patterns), although the infrastructure is there for frequency-dependent directivities: frequency-dependent usage in Room.compute_rir and abstract Directivity class.
Examples scripts and notebook for directivities.
Bugfix¶
Fix wrong bracketing for negative values in is_inside (ShoeBox)
0.4.3 - 2021-02-18¶
Added¶
Support for Python 3.8 and 3.9
Bugfix¶
Fixes typo in a docstring
Update docs to better reflect actual function parameters
Fixes the computation of the cost function of SRP-PHAT doa algorithm (bug reported in #PR197)
Changed¶
Improve the computation of the auxiliary variables in AuxIVA and ILRMA. Unnecessary division operations are reduced.
0.4.2 - 2020-09-24¶
Bugfix¶
Fixes the Dockerfile so that we don’t have to install the build dependencies manually
Change the eps for geometry computations from 1e-4 to 1e-5 in
libroom
Added¶
A specialized
is_inside
routine forShoeBox
rooms
0.4.1 - 2020-07-02¶
Bugfix¶
Issue #162 (crash with max_order>31 on windows), seems fixed by the new C++ simulator
Test for issue #162 added
Fix Binder link
Adds the pyproject.toml file in MANIFEST.in so that it gets picked up for packaging
Added¶
Minimal Dockerfile example.
0.4.0 - 2020-06-03¶
Improved Simulator with Ray Tracing¶
Ray Tracing in the libroom module. The function compute_rir() of the Room object in python can now be executed using a pure ray tracing approach or a hybrid (ISM + RT) approach. That’s why this function has now several default arguments to run ray tracing (number of rays, scattering coefficient, energy and time thresholds, microphone’s radius).
Bandpass filterbank construction in
pyroomacoustics.acoustics.bandpass_filterbank
Acoustic properties of different materials in
pyroomacoustics.materials
Scattering from the wall is handled via ray tracing method, scattering coefficients are provided in
pyroomacoustics.materials.Material
objectsFunction
inverse_sabine
allows to compute theabsorption
andmax_order
to use with the image source model to achieve a given reverberation timeThe method
rt60_theory
inpyroomacoustics.room.Room
allows to compute the theoretical RT60 of the room according to Eyring or Sabine formulaThe method
measure_rt60
inpyroomacoustics.room.Room
allows to measure the RT60 of the simulated RIRs
Changes in the Room Class¶
Deep refactor of Room class. The constructor arguments have changed
No more
sigma2_awgn
, noise is now handled inpyroomacoustics.Room.simulate
methodThe way absorption is handled has changed. The scalar variables
absorption
are deprecated in favor ofpyroomacoustics.materials.Material
Complete refactor of libroom, the compiled extension module responsible for the room simulation, into C++. The bindings to python are now done using pybind11.
Removes the pure Python room simulator as it was really slow
pyroomacoustics.transform.analysis
,pyroomacoustics.transform.synthesis
,pyroomacoustics.transform.compute_synthesis_window
, have been deprecated in favor ofpyroomacoustics.transform.stft.analysis
,pyroomacoustics.transform.stft.synthesis
,pyroomacoustics.transform.stft.compute_synthesis_window
.pyroomacoustics.Room
has a new methodadd
that can be used to add either aSoundSource
, or aMicrophoneArray
object. Subsequent calls to the method will always add source/microphones. There exists also methodsadd_source
andadd_microphone
that can be used to add source/microphone via coordinates. The methodadd_microphone_array
can be used to add aMicrophoneArray
object, or a 2D array containing the locations of several microphones in its columns. While theadd_microphone_array
method used to replace the existing array by the argument, the new behavior is to add in addition to other microphones already present.
Bugfix¶
From Issue #150, increase max iterations to check if point is inside room
Issues #117 #163, adds project file pyproject.toml so that pip can know which dependencies are necessary for setup
Fixed some bugs in the documentation
Fixed normalization part in FastMNMF
Added¶
Added room_isinside_max_iter in parameters.py
Default set to 20 rather than 5 as it was in pyroomacoustics.room.Room.isinside
Added Binder link in the README for online interactive demo
Changed¶
Changed while loop to iterate up to room_isinside_max_iter in pyroomacoustics.room.Room.isinside
Changed initialization of FastMNMF to accelerate convergence
Fixed bug in doa/tops (float -> integer division)
Added vectorised functions in MUSIC
Use the vectorised functions in _process of MUSIC
0.3.1 - 2019-11-06¶
Bugfix¶
Fixed a non-unicode character in
pyroomacoustics.experimental.rt60
breaking the tests
0.3.0 - 2019-11-06¶
Added¶
The routine
pyroomacoustics.experimental.measure_rt60
to automatically measure the reverberation time of impulse responses. This is useful for measured and simulated responses.
Bugfix¶
Fixed docstring and an argument of pyroomacoustics.bss.ilrma
0.2.0 - 2019-09-04¶
Added¶
Added FastMNMF (Fast Multichannel Nonnegative Matrix Factorization) to
bss
subpackage.Griffin-Lim algorithm for phase reconstruction from STFT magnitude measurements.
Changed¶
Removed the supperfluous warnings in pyroomacoustics.transform.stft.
Add option in pyroomacoustics.room.Room.plot_rir to set pair of channels to plot; useful when there’s too many impulse responses.
Add some window functions in windows.py and rearranged it in alphabetical order
Fixed various warnings in tests.
Faster implementation of AuxIVA that also includes OverIVA (more mics than sources). It also comes with a slightly changed API, Laplace and time-varying Gauss statistical models, and two possible initialization schemes.
Faster implementation of ILRMA.
SparseAuxIVA has slightly changed API,
f_contrast
has been replaced bymodel
keyword argument.
Bugfix¶
Set
rcond=None
in all calls tonumpy.linalg.lstsq
to remove aFutureWarning
Add a lower bound to activations in
pyroomacoustics.bss.auxiva
to avoid underflow and divide by zero.Fixed a memory leak in the C engine for polyhedral room (issue #116).
Fixed problem caused by dependency of setup.py on Cython (Issue #117)
0.1.23 - 2019-04-17¶
Bugfix¶
Expose
mu
parameter foradaptive.subband_lms.SubbandLMS
.Add SSL context to
download_uncompress
and unit test; error for Python 2.7.
0.1.22 - 2019-04-11¶
Added¶
Added “precision” parameter to “stft” class to choose between ‘single’ (float32/complex64) or ‘double’ (float64/complex128) for processing precision.
Unified unit test file for frequency-domain souce separation methods.
New algorithm for blind source separation (BSS): Sparse Independent Vector Analysis (SparseAuxIVA).
Changed¶
Few README improvements
Bugfix¶
Remove
np.squeeze
in STFT as it caused errors when an axis that shouldn’t be squeezed was equal to 1.Beamformer.process
was using old (non-existent) STFT function. Changed to using one-shot function fromtransform
module.Fixed a bug in
utilities.fractional_delay_filter_bank
that would cause the function to crash on some inputs (issue #87).
0.1.21 - 2018-12-20¶
Added¶
Adds several options to
pyroomacoustics.room.Room.simulate
to finely control the SNR of the microphone signals and also return the microphone signals with individual sources, prior to mix (useful for BSS evaluation)Add subspace denoising approach in
pyroomacoustics.denoise.subspace
.Add iterative Wiener filtering approach for single channel denoising in
pyroomacoustics.denoise.iterative_wiener
.
Changed¶
Add build instructions for python 3.7 and wheels for Mac OS X in the continuous integration (Travis and Appveyor)
Limits imports of matplotlib to within plotting functions so that the matplotlib backend can still be changed, even after importing pyroomacoustics
Better Vectorization of the computations in
pyroomacoustics.bss.auxiva
Bugfix¶
Corrects a bug that causes different behavior whether sources are provided to the constructor of
Room
or to theadd_source
methodCorrects a typo in
pyroomacoustics.SoundSource.add_signal
Corrects a bug in the update of the demixing matrix in
pyroomacoustics.bss.auxiva
Corrects invalid memory access in the
pyroomacoustics.build_rir
cython accelerator and adds a unit test that checks the cython code output is correctFix bad handling of 1D b vectors in
`pyroomacoustics.levinson
.
0.1.20 - 2018-10-04¶
Added¶
STFT tutorial and demo notebook.
New algorithm for blind source separation (BSS): Independent Low-Rank Matrix Analysis (ILRMA)
Changed¶
Matplotlib is not a hard requirement anymore. When matplotlib is not installed, only a warning is issued on plotting commands. This is useful to run pyroomacoustics on headless servers that might not have matplotlib installed
Removed dependencies on
joblib
andrequests
packagesApply
matplotlib.pyplot.tight_layout
inpyroomacoustics.Room.plot_rir
Bugfix¶
Monaural signals are now properly handled in one-shot stft/istft
Corrected check of size of absorption coefficients list in
Room.from_corners
0.1.19 - 2018-09-24¶
Added¶
Added noise reduction sub-package
denoise
with spectral subtraction class and example.Renamed
realtime
totransform
and added deprecation warning.Added a cython function to efficiently compute the fractional delays in the room impulse response from time delays and attenuations
notebooks folder.
Demo IPython notebook (with WAV files) of several features of the package.
Wrapper for Google’s Speech Command Dataset and an example usage script in
examples
.Lots of new features in the
pyroomacoustics.realtime
subpackageThe
STFT
class can now be used both for frame-by-frame processing or for bulk processingThe functionality will replace the methods
pyroomacoustics.stft
,pyroomacoustics.istft
,pyroomacoustics.overlap_add
, etc,The new function
pyroomacoustics.realtime.compute_synthesis_window
computes the optimal synthesis window given an analysis window and the frame shiftExtensive tests for the
pyroomacoustics.realtime
moduleConvenience functions
pyroomacoustics.realtime.analysis
andpyroomacoustics.realtime.synthesis
with an interface similar topyroomacoustics.stft
andpyroomacoustics.istft
(which are now deprecated and will disappear soon)The ordering of axis in the output from bulk STFT is now
(n_frames, n_frequencies, n_channels)
Support for Intel’s
mkl_fft
packageaxis
(along which to perform DFT) andbits
parameters forDFT
class.
Changed¶
Improved documentation and docstrings
Using now the built-in RIR generator in examples/doa_algorithms.py
Improved the download/uncompress function for large datasets
Dusted the code for plotting on the sphere in
pyroomacoustics.doa.grid.GridSphere
Deprecation Notice¶
The methods
pyroomacoustics.stft
,pyroomacoustics.istft
,pyroomacoustics.overlap_add
, etc, are now deprecated and will be removed in the near future
0.1.18 - 2018-04-24¶
Added¶
Added AuxIVA (independent vector analysis) to
bss
subpackage.Added BSS IVA example
Changed¶
Moved Trinicon blind source separation algorithm to
bss
subpackage.
Bugfix¶
Correct a bug that causes 1st order sources to be generated for max_order==0 in pure python code
0.1.17 - 2018-03-23¶
Bugfix¶
Fixed issue #22 on github. Added INCREF before returning Py_None in C extension.
0.1.16 - 2018-03-06¶
Added¶
Base classes for Dataset and Sample in
pyroomacoustics.datasets
Methods to filter datasets according to the metadata of samples
Deprecation warning for the TimitCorpus interface
Changed¶
Add list of speakers and sentences from CMU ARCTIC
CMUArcticDatabase basedir is now the top directory where CMU_ARCTIC database should be saved. Not the directory above as it previously was.
Libroom C extension is now a proper package. It can be imported.
Libroom C extension now compiles on windows with python>=3.5.
0.1.15 - 2018-02-23¶
Bugfix¶
Added
pyroomacoustics.datasets
to list of sub-packages insetup.py
0.1.14 - 2018-02-20¶
Added¶
Changelog
CMU ARCTIC corpus wrapper in
pyroomacoustics.datasets
Changed¶
Moved TIMIT corpus wrapper from
pyroomacoustics.recognition
module to sub-packagepyroomacoustics.datasets.timit
Room Simulation¶
Room¶
The three main classes are pyroomacoustics.room.Room
,
pyroomacoustics.soundsource.SoundSource
, and
pyroomacoustics.beamforming.MicrophoneArray
. On a high level, a
simulation scenario is created by first defining a room to which a few sound
sources and a microphone array are attached. The actual audio is attached to
the source as raw audio samples.
Then, a simulation method is used to create artificial room impulse responses (RIR) between the sources and microphones. The current default method is the image source which considers the walls as perfect reflectors. An experimental hybrid simulator based on image source method (ISM) 1 and ray tracing (RT) 2, 3, is also available. Ray tracing better capture the later reflections and can also model effects such as scattering.
The microphone signals are then created by convolving audio samples associated to sources with the appropriate RIR. Since the simulation is done on discrete-time signals, a sampling frequency is specified for the room and the sources it contains. Microphones can optionally operate at a different sampling frequency; a rate conversion is done in this case.
Simulating a Shoebox Room with the Image Source Model¶
We will first walk through the steps to simulate a shoebox-shaped room in 3D. We use the ISM is to find all image sources up to a maximum specified order and room impulse responses (RIR) are generated from their positions.
The code for the full example can be found in examples/room_from_rt60.py.
Create the room¶
So-called shoebox rooms are pallelepipedic rooms with 4 or 6 walls (in 2D and
3D respectiely), all at right angles. They are defined by a single vector that
contains the lengths of the walls. They have the advantage of being simple to
define and very efficient to simulate. In the following example, we define a
9m x 7.5m x 3.5m
room. In addition, we use Sabine’s formula
to find the wall energy absorption and maximum order of the ISM required
to achieve a desired reverberation time (RT60, i.e. the time it takes for
the RIR to decays by 60 dB).
import pyroomacoustics as pra
# The desired reverberation time and dimensions of the room
rt60 = 0.5 # seconds
room_dim = [9, 7.5, 3.5] # meters
# We invert Sabine's formula to obtain the parameters for the ISM simulator
e_absorption, max_order = pra.inverse_sabine(rt60, room_dim)
# Create the room
room = pra.ShoeBox(
room_dim, fs=16000, materials=pra.Material(e_absorption), max_order=max_order
)
The second argument is the sampling frequency at which the RIR will be
generated. Note that the default value of fs
is 8 kHz.
The third argument is the material of the wall, that itself takes the absorption as a parameter.
The fourth and last argument is the maximum number of reflections allowed in the ISM.
Note
Note that Sabine’s formula is only an approximation and that the actually simulated RT60 may vary by quite a bit.
Warning
Until recently, rooms would take an absorption
parameter that was
actually not the energy absorption we use now. The absorption
parameter is now deprecated and will be removed in the future.
Randomized Image Method¶
In highly symmetric shoebox rooms, the regularity of image sources’ positions leads to a monotonic convergence in the time arrival of far-field image pairs. This causes sweeping echoes. The randomized image method adds a small random displacement to the image source positions, so that they are no longer time-aligned, thus reducing sweeping echoes considerably.
To use the randomized method, set the flag use_rand_ism
to True while creating
a room. Additionally, the maximum displacement of the image sources can be
chosen by setting the parameter max_rand_disp
. The default value is 8cm.
For a full example see examples/randomized_image_method.py
import pyroomacoustics as pra
# The desired reverberation time and dimensions of the room
rt60 = 0.5 # seconds
room_dim = [5, 5, 5] # meters
# We invert Sabine's formula to obtain the parameters for the ISM simulator
e_absorption, max_order = pra.inverse_sabine(rt60, room_dim)
# Create the room
room = pra.ShoeBox(
room_dim, fs=16000, materials=pra.Material(e_absorption), max_order=max_order,
use_rand_ism = True, max_rand_disp = 0.05
)
Add sources and microphones¶
Sources are fairly straightforward to create. They take their location as single
mandatory argument, and a signal and start time as optional arguments. Here we
create a source located at [2.5, 3.73, 1.76]
within the room, that will utter
the content of the wav file speech.wav
starting at 1.3 s
into the
simulation. The signal
keyword argument to the
add_source()
method should be a
one-dimensional numpy.ndarray
containing the desired sound signal.
# import a mono wavfile as the source signal
# the sampling frequency should match that of the room
from scipy.io import wavfile
_, audio = wavfile.read('speech.wav')
# place the source in the room
room.add_source([2.5, 3.73, 1.76], signal=audio, delay=1.3)
The locations of the microphones in the array should be provided in a numpy
nd-array
of size (ndim, nmics)
, that is each column contains the
coordinates of one microphone. Note that it can be different from that
of the room, in which case resampling will occur. Here, we create an array
with two microphones placed at [6.3, 4.87, 1.2]
and [6.3, 4.93, 1.2]
.
# define the locations of the microphones
import numpy as np
mic_locs = np.c_[
[6.3, 4.87, 1.2], # mic 1
[6.3, 4.93, 1.2], # mic 2
]
# finally place the array in the room
room.add_microphone_array(mic_locs)
A number of routines exist to create regular array geometries in 2D.
Adding source or microphone directivity¶
The directivity pattern of a source or microphone can be conveniently set
through the directivity
keyword argument.
First, a pyroomacoustics.directivities.Directivity
object needs to be created. As of
Sep 6, 2021, only frequency-independent directivities from the
cardioid family
are supported, namely figure-eight, hypercardioid, cardioid, and subcardioid.
Below is how a pyroomacoustics.directivities.Directivity
object can be created, for
example a hypercardioid pattern pointing at an azimuth angle of 90 degrees and a colatitude
angle of 15 degrees.
# create directivity object
from pyroomacoustics.directivities import (
DirectivityPattern,
DirectionVector,
CardioidFamily,
)
dir_obj = CardioidFamily(
orientation=DirectionVector(azimuth=90, colatitude=15, degrees=True),
pattern_enum=DirectivityPattern.HYPERCARDIOID,
)
After creating a pyroomacoustics.directivities.Directivity
object, it is straightforward
to set the directivity of a source, microphone, or microphone array, namely by using the
directivity
keyword argument.
For example, to set a source’s directivity:
# place the source in the room
room.add_source(position=[2.5, 3.73, 1.76], directivity=dir_obj)
To set a single microphone’s directivity:
# place the microphone in the room
room.add_microphone(loc=[2.5, 5, 1.76], directivity=dir_obj)
The same directivity pattern can be used for all microphones in an array:
# place microphone array in the room
import numpy as np
mic_locs = np.c_[
[6.3, 4.87, 1.2], # mic 1
[6.3, 4.93, 1.2], # mic 2
]
room.add_microphone_array(mic_locs, directivity=dir_obj)
Or a different directivity can be used for each microphone by passing a list of
pyroomacoustics.directivities.Directivity
objects:
# place the microphone array in the room
room.add_microphone_array(mic_locs, directivity=[dir_1, dir_2])
Warning
As of Sep 6, 2021, setting directivity patterns for sources and microphone is only supported for the image source method (ISM). Moreover, source direcitivities are only supported for shoebox-shaped rooms.
Create the Room Impulse Response¶
At this point, the RIRs are simply created by invoking the ISM via
image_source_model()
. This function will
generate all the images sources up to the order required and use them to
generate the RIRs, which will be stored in the rir
attribute of room
.
The attribute rir
is a list of lists so that the outer list is on microphones
and the inner list over sources.
room.compute_rir()
# plot the RIR between mic 1 and source 0
import matplotlib.pyplot as plt
plt.plot(room.rir[1][0])
plt.show()
Warning
The simulator uses a fractional delay filter that introduce a global delay in the RIR. The delay can be obtained as follows.
global_delay = pra.constants.get("frac_delay_length") // 2
Simulate sound propagation¶
By calling simulate()
, a convolution of the
signal of each source (if not None
) will be performed with the
corresponding room impulse response. The output from the convolutions will be summed up
at the microphones. The result is stored in the signals
attribute of room.mic_array
with each row corresponding to one microphone.
room.simulate()
# plot signal at microphone 1
plt.plot(room.mic_array.signals[1,:])
Full Example¶
This example is partly exctracted from ./examples/room_from_rt60.py.
import numpy as np
import matplotlib.pyplot as plt
import pyroomacoustics as pra
from scipy.io import wavfile
# The desired reverberation time and dimensions of the room
rt60_tgt = 0.3 # seconds
room_dim = [10, 7.5, 3.5] # meters
# import a mono wavfile as the source signal
# the sampling frequency should match that of the room
fs, audio = wavfile.read("examples/samples/guitar_16k.wav")
# We invert Sabine's formula to obtain the parameters for the ISM simulator
e_absorption, max_order = pra.inverse_sabine(rt60_tgt, room_dim)
# Create the room
room = pra.ShoeBox(
room_dim, fs=fs, materials=pra.Material(e_absorption), max_order=max_order
)
# place the source in the room
room.add_source([2.5, 3.73, 1.76], signal=audio, delay=0.5)
# define the locations of the microphones
mic_locs = np.c_[
[6.3, 4.87, 1.2], [6.3, 4.93, 1.2], # mic 1 # mic 2
]
# finally place the array in the room
room.add_microphone_array(mic_locs)
# Run the simulation (this will also build the RIR automatically)
room.simulate()
room.mic_array.to_wav(
f"examples/samples/guitar_16k_reverb_{args.method}.wav",
norm=True,
bitdepth=np.int16,
)
# measure the reverberation time
rt60 = room.measure_rt60()
print("The desired RT60 was {}".format(rt60_tgt))
print("The measured RT60 is {}".format(rt60[1, 0]))
# Create a plot
plt.figure()
# plot one of the RIR. both can also be plotted using room.plot_rir()
rir_1_0 = room.rir[1][0]
plt.subplot(2, 1, 1)
plt.plot(np.arange(len(rir_1_0)) / room.fs, rir_1_0)
plt.title("The RIR from source 0 to mic 1")
plt.xlabel("Time [s]")
# plot signal at microphone 1
plt.subplot(2, 1, 2)
plt.plot(room.mic_array.signals[1, :])
plt.title("Microphone 1 signal")
plt.xlabel("Time [s]")
plt.tight_layout()
plt.show()
Hybrid ISM/Ray Tracing Simulator¶
Warning
The hybrid simulator has not been thoroughly tested yet and should be used with
care. The exact implementation and default settings may also change in the future.
Currently, the default behavior of Room
and ShoeBox
has been kept as in previous
versions of the package. Bugs and user experience can be reported on
github.
The hybrid ISM/RT simulator uses ISM to simulate the early reflections in the RIR and RT for the diffuse tail. Our implementation is based on 2 and 3.
The simulator has the following features.
Scattering: Wall scattering can be defined by assigning a scattering coefficient to the walls together with the energy absorption.
Multi-band: The simulation can be carried out with different parameters for different octave bands. The octave bands go from 125 Hz to half the sampling frequency.
Air absorption: The frequency dependent absorption of the air can be turned by providing the keyword argument
air_absorption=True
to the room constructor.
Here is a simple example using the hybrid simulator.
We suggest to use max_order=3
with the hybrid simulator.
# Create the room
room = pra.ShoeBox(
room_dim,
fs=16000,
materials=pra.Material(e_absorption),
max_order=3,
ray_tracing=True,
air_absorption=True,
)
# Activate the ray tracing
room.set_ray_tracing()
A few example programs are provided in ./examples
.
./examples/ray_tracing.py
demonstrates use of ray tracing for rooms of different sizes and with different amounts of reverberation./examples/room_L_shape_3d_rt.py
shows how to simulate a polyhedral room./examples/room_from_stl.py
demonstrates how to import a model from an STL file
Wall Materials¶
The wall materials are handled by the
Material
objects. A material is defined
by at least one absorption coefficient that represents the ratio of sound
energy absorbed by a wall upon reflection.
A material may have multiple absorption coefficients corresponding to different
abosrptions at different octave bands.
When only one coefficient is provided, the absorption is assumed to be uniform at
all frequencies.
In addition, materials may have one or more scattering coefficients
corresponding to the ratio of energy scattered upon reflection.
The materials can be defined by providing the coefficients directly, or they can be defined by chosing a material from the materials database 2.
import pyroomacoustics as pra
m = pra.Material(energy_absorption="hard_surface")
room = pra.ShoeBox([9, 7.5, 3.5], fs=16000, materials=m, max_order=17)
We can use different materials for different walls. In this case, the materials should be
provided in a dictionary. For a shoebox room, this can be done as follows.
We use the make_materials()
helper
function to create a dict
of
Material
objects.
import pyroomacoustics as pra
m = pra.make_materials(
ceiling="hard_surface",
floor="6mm_carpet",
east="brickwork",
west="brickwork",
north="brickwork",
south="brickwork",
)
room = pra.ShoeBox(
[9, 7.5, 3.5], fs=16000, materials=m, max_order=17, air_absorption=True, ray_tracing=True
)
For a more complete example see examples/room_complex_wall_materials.py.
Note
For shoebox rooms, the walls are labelled as follows:
west
/east
for the walls in the y-z plane with a small/large x coordinates, respectivelysouth
/north
for the walls in the x-z plane with a small/large y coordinates, respectivelyfloor
/ceiling
for the walls int x-y plane with small/large z coordinates, respectively
Controlling the signal-to-noise ratio¶
It is in general necessary to scale the signals from different sources to
obtain a specific signal-to-noise or signal-to-interference ratio (SNR and SIR,
respectively). This can be done by passing some options to the simulate()
function. Because the relative amplitude of signals will change at different microphones
due to propagation, it is necessary to choose a reference microphone. By default, this
will be the first microphone in the array (index 0). The simplest choice is to choose
the variance of the noise \(\sigma_n^2\) to achieve a desired SNR with respect
to the cumulative signal from all sources. Assuming that the signals from all sources
are scaled to have the same amplitude (e.g., unit amplitude) at the reference microphone,
the SNR is defined as
where \(K\) is the number of sources. For example, an SNR of 10 decibels (dB) can be obtained using the following code
room.simulate(reference_mic=0, snr=10)
Sometimes, more challenging normalizations are necessary. In that case,
a custom callback function can be provided to simulate. For example,
we can imagine a scenario where we have n_src
out of which n_tgt
are the targets, the rest being interferers. We will assume all
targets have unit variance, and all interferers have equal
variance \(\sigma_i^2\) (at the reference microphone). In
addition, there is uncorrelated noise \(\sigma_n^2\) at
every microphones. We will define SNR and SIR with respect
to a single target source:
The callback function callback_mix
takes as argument an nd-array
premix_signals
of shape (n_src, n_mics, n_samples)
that contains the
microphone signals prior to mixing. The signal propagated from the k
-th
source to the m
-th microphone is contained in premix_signals[k,m,:]
. It
is possible to provide optional arguments to the callback via
callback_mix_kwargs
optional argument. Here is the code
implementing the example described.
# the extra arguments are given in a dictionary
callback_mix_kwargs = {
'snr' : 30, # SNR target is 30 decibels
'sir' : 10, # SIR target is 10 decibels
'n_src' : 6,
'n_tgt' : 2,
'ref_mic' : 0,
}
def callback_mix(premix, snr=0, sir=0, ref_mic=0, n_src=None, n_tgt=None):
# first normalize all separate recording to have unit power at microphone one
p_mic_ref = np.std(premix[:,ref_mic,:], axis=1)
premix /= p_mic_ref[:,None,None]
# now compute the power of interference signal needed to achieve desired SIR
sigma_i = np.sqrt(10 ** (- sir / 10) / (n_src - n_tgt))
premix[n_tgt:n_src,:,:] *= sigma_i
# compute noise variance
sigma_n = np.sqrt(10 ** (- snr / 10))
# Mix down the recorded signals
mix = np.sum(premix[:n_src,:], axis=0) + sigma_n * np.random.randn(*premix.shape[1:])
return mix
# Run the simulation
room.simulate(
callback_mix=callback_mix,
callback_mix_kwargs=callback_mix_kwargs,
)
mics_signals = room.mic_array.signals
In addition, it is desirable in some cases to obtain the microphone signals
with individual sources, prior to mixing. For example, this is useful to
evaluate the output from blind source separation algorithms. In this case, the
return_premix
argument should be set to True
premix = room.simulate(return_premix=True)
Reverberation Time¶
The reverberation time (RT60) is defined as the time needed for the enery of
the RIR to decrease by 60 dB. It is a useful measure of the amount of
reverberation. We provide a method in the
measure_rt60()
to measure the RT60
of recorded or simulated RIR.
The method is also directly integrated in the Room
object as the method measure_rt60()
.
# assuming the simulation has already been carried out
rt60 = room.measure_rt60()
for m in room.n_mics:
for s in room.n_sources:
print(
"RT60 between the {}th mic and {}th source: {:.3f} s".format(m, s, rt60[m, s])
)
Free-field simulation¶
You can also use this package to simulate free-field sound propagation between
a set of sound sources and a set of microphones, without considering room
effects. To this end, you can use the
pyroomacoustics.room.AnechoicRoom
class, which simply corresponds to
setting the maximum image image order of the room simulation to zero. This
allows for early development and testing of various audio-based algorithms,
without worrying about room acoustics at first. Thanks to the modular framework
of pyroomacoustics, room acoustics can easily be added, after this first
testing stage, for more realistic simulations.
Use this if you can neglect room effects (e.g. you operate in an anechoic room or outdoors), or if you simply want to test your algorithm in the best-case scenario. The below code shows how to create and simualte an anechoic room. For a more involved example (testing a the DOA algorithm MUSIC in an anechoic room), see ./examples/doa_anechoic_room.py.
# Create anechoic room.
room = pra.AnechoicRoom(fs=16000)
# Place the microphone array around the origin.
mic_locs = np.c_[
[0.1, 0.1, 0],
[-0.1, 0.1, 0],
[-0.1, -0.1, 0],
[0.1, -0.1, 0],
]
room.add_microphone_array(mic_locs)
# Add a source. We use a white noise signal for the source, and
# the source can be arbitrarily far because there are no walls.
x = np.random.randn(2**10)
room.add_source([10.0, 20.0, -20], signal=x)
# run the simulation
room.simulate()
References
- 1
Allen and D. A. Berkley, Image method for efficiently simulating small-room acoustics, J. Acoust. Soc. Am., vol. 65, no. 4, p. 943, 1979.
- 2(1,2,3)
Vorlaender, Auralization, 1st ed. Berlin: Springer-Verlag, 2008, pp. 1-340.
- 3(1,2)
Schroeder, Physically based real-time auralization of interactive virtual environments. PhD Thesis, RWTH Aachen University, 2011.
- class pyroomacoustics.room.AnechoicRoom(dim=3, fs=8000, t0=0.0, sigma2_awgn=None, sources=None, mics=None, temperature=None, humidity=None, air_absorption=False)¶
Bases:
ShoeBox
This class provides an API for creating an Anechoic “room” in 2D or 3D.
- Parameters
dim (int) – Dimension of the room (2 or 3).
fs (int, optional) – The sampling frequency in Hz. Default is 8000.
t0 (float, optional) – The global starting time of the simulation in seconds. Default is 0.
sigma2_awgn (float, optional) – The variance of the additive white Gaussian noise added during simulation. By default, none is added.
sources (list of SoundSource objects, optional) – Sources to place in the room. Sources can be added after room creating with the add_source method by providing coordinates.
mics (MicrophoneArray object, optional) – The microphone array to place in the room. A single microphone or microphone array can be added after room creation with the add_microphone_array method.
temperature (float, optional) – The air temperature in the room in degree Celsius. By default, set so that speed of sound is 343 m/s.
humidity (float, optional) – The relative humidity of the air in the room (between 0 and 100). By default set to 0.
air_absorption (bool, optional) – If set to True, absorption of sound energy by the air will be simulated.
- get_bbox()¶
Returns a bounding box for the mics and sources, for plotting.
- is_inside(p)¶
Overloaded function to eliminate testing if objects are inside “room”.
- plot(**kwargs)¶
Overloaded function to issue warning when img_order is given.
- plot_walls(ax)¶
Overloaded function to eliminate wall plotting.
- class pyroomacoustics.room.Room(walls, fs=8000, t0=0.0, max_order=1, sigma2_awgn=None, sources=None, mics=None, temperature=None, humidity=None, air_absorption=False, ray_tracing=False, use_rand_ism=False, max_rand_disp=0.08)¶
Bases:
object
A Room object has as attributes a collection of
pyroomacoustics.wall.Wall
objects, apyroomacoustics.beamforming.MicrophoneArray
array, and a list ofpyroomacoustics.soundsource.SoundSource
. The room can be two dimensional (2D), in which case the walls are simply line segments. A factory methodpyroomacoustics.room.Room.from_corners()
can be used to create the room from a polygon. In three dimensions (3D), the walls are two dimensional polygons, namely a collection of points lying on a common plane. Creating rooms in 3D is more tedious and for convenience a methodpyroomacoustics.room.Room.extrude()
is provided to lift a 2D room into 3D space by adding vertical walls and parallel floor and ceiling.The Room is sub-classed by
pyroomacoustics.room.ShoeBox
which creates a rectangular (2D) or parallelepipedic (3D) room. Such rooms benefit from an efficient algorithm for the image source method.- Attribute walls
(Wall array) list of walls forming the room
- Attribute fs
(int) sampling frequency
- Attribute max_order
(int) the maximum computed order for images
- Attribute sources
(SoundSource array) list of sound sources
- Attribute mics
(MicrophoneArray) array of microphones
- Attribute corners
(numpy.ndarray 2xN or 3xN, N=number of walls) array containing a point belonging to each wall, used for calculations
- Attribute absorption
(numpy.ndarray size N, N=number of walls) array containing the absorption factor for each wall, used for calculations
- Attribute dim
(int) dimension of the room (2 or 3 meaning 2D or 3D)
- Attribute wallsId
(int dictionary) stores the mapping “wall name -> wall id (in the array walls)”
- Parameters
walls (list of Wall or Wall2D objects) – The walls forming the room.
fs (int, optional) – The sampling frequency in Hz. Default is 8000.
t0 (float, optional) – The global starting time of the simulation in seconds. Default is 0.
max_order (int, optional) – The maximum reflection order in the image source model. Default is 1, namely direct sound and first order reflections.
sigma2_awgn (float, optional) – The variance of the additive white Gaussian noise added during simulation. By default, none is added.
sources (list of SoundSource objects, optional) – Sources to place in the room. Sources can be added after room creating with the add_source method by providing coordinates.
mics (MicrophoneArray object, optional) – The microphone array to place in the room. A single microphone or microphone array can be added after room creation with the add_microphone_array method.
temperature (float, optional) – The air temperature in the room in degree Celsius. By default, set so that speed of sound is 343 m/s.
humidity (float, optional) – The relative humidity of the air in the room (between 0 and 100). By default set to 0.
air_absorption (bool, optional) – If set to True, absorption of sound energy by the air will be simulated.
ray_tracing (bool, optional) – If set to True, the ray tracing simulator will be used along with image source model.
use_rand_ism (bool, optional) – If set to True, image source positions will have a small random displacement to prevent sweeping echoes
max_rand_disp (float, optional;) – If using randomized image source method, what is the maximum displacement of the image sources?
- add(obj)¶
Adds a sound source or microphone to a room
- Parameters
obj (
SoundSource
orMicrophone
object) – The object to add- Returns
The room is returned for further tweaking.
- Return type
- add_microphone(loc, fs=None, directivity=None)¶
Adds a single microphone in the room.
- Parameters
loc (array_like or ndarray) – The location of the microphone. The length should be the same as the room dimension.
fs (float, optional) – The sampling frequency of the microphone, if different from that of the room.
- Returns
The room is returned for further tweaking.
- Return type
- add_microphone_array(mic_array, directivity=None)¶
Adds a microphone array (i.e. several microphones) in the room.
- Parameters
mic_array (array_like or ndarray or MicrophoneArray object) –
The array can be provided as an array of size
(dim, n_mics)
, wheredim
is the dimension of the room andn_mics
is the number of microphones in the array.As an alternative, a
MicrophoneArray
can be provided.- Returns
The room is returned for further tweaking.
- Return type
- add_soundsource(sndsrc, directivity=None)¶
Adds a
pyroomacoustics.soundsource.SoundSource
object to the room.- Parameters
sndsrc (
SoundSource
object) – The SoundSource object to add to the room
- add_source(position, signal=None, delay=0, directivity=None)¶
Adds a sound source given by its position in the room. Optionally a source signal and a delay can be provided.
- Parameters
position (ndarray, shape: (2,) or (3,)) – The location of the source in the room
signal (ndarray, shape: (n_samples,), optional) – The signal played by the source
delay (float, optional) – A time delay until the source signal starts in the simulation
- Returns
The room is returned for further tweaking.
- Return type
- compute_rir()¶
Compute the room impulse response between every source and microphone.
- direct_snr(x, source=0)¶
Computes the direct Signal-to-Noise Ratio
- extrude(height, v_vec=None, absorption=None, materials=None)¶
Creates a 3D room by extruding a 2D polygon. The polygon is typically the floor of the room and will have z-coordinate zero. The ceiling
- Parameters
height (float) – The extrusion height
v_vec (array-like 1D length 3, optional) – A unit vector. An orientation for the extrusion direction. The ceiling will be placed as a translation of the floor with respect to this vector (The default is [0,0,1]).
absorption (float or array-like, optional) – Absorption coefficients for all the walls. If a scalar, then all the walls will have the same absorption. If an array is given, it should have as many elements as there will be walls, that is the number of vertices of the polygon plus two. The two last elements are for the floor and the ceiling, respectively. It is recommended to use materials instead of absorption parameter. (Default: 1)
materials (dict) – Absorption coefficients for floor and ceiling. This parameter overrides absorption. (Default: {“floor”: 1, “ceiling”: 1})
- classmethod from_corners(corners, absorption=None, fs=8000, t0=0.0, max_order=1, sigma2_awgn=None, sources=None, mics=None, materials=None, **kwargs)¶
Creates a 2D room by giving an array of corners.
- Parameters
corners ((np.array dim 2xN, N>2)) – list of corners, must be antiClockwise oriented
absorption (float array or float) – list of absorption factor for each wall or single value for all walls (deprecated, use
materials
instead)fs (int, optional) – The sampling frequency in Hz. Default is 8000.
t0 (float, optional) – The global starting time of the simulation in seconds. Default is 0.
max_order (int, optional) – The maximum reflection order in the image source model. Default is 1, namely direct sound and first order reflections.
sigma2_awgn (float, optional) – The variance of the additive white Gaussian noise added during simulation. By default, none is added.
sources (list of SoundSource objects, optional) – Sources to place in the room. Sources can be added after room creating with the add_source method by providing coordinates.
mics (MicrophoneArray object, optional) – The microphone array to place in the room. A single microphone or microphone array can be added after room creation with the add_microphone_array method.
kwargs (key, value mappings) – Other keyword arguments accepted by the
Room
class
- Return type
Instance of a 2D room
- get_bbox()¶
Returns a bounding box for the room
- get_volume()¶
Computes the volume of the room
- Returns
the volume of the room
- Return type
float
- get_wall_by_name(name)¶
Returns the instance of the wall by giving its name.
- Parameters
name (string) – name of the wall
- Returns
instance of the wall with this name
- Return type
Wall
- image_source_model()¶
- is_inside(p, include_borders=True)¶
Checks if the given point is inside the room.
- Parameters
p (array_like, length 2 or 3) – point to be tested
include_borders (bool, optional) – set true if a point on the wall must be considered inside the room
- Return type
True if the given point is inside the room, False otherwise.
- property is_multi_band¶
- measure_rt60(decay_db=60, plot=False)¶
Measures the reverberation time (RT60) of the simulated RIR.
- Parameters
decay_db (float) – This is the actual decay of the RIR used for the computation. The default is 60, meaning that the RT60 is exactly what we measure. In some cases, the signal may be too short to measure 60 dB decay. In this case, we can specify a lower value. For example, with 30 dB, the RT60 is twice the time measured.
plot (bool) – Displays a graph of the Schroeder curve and the estimated RT60.
- Returns
An array that contains the measured RT60 for all the RIR.
- Return type
ndarray (n_mics, n_sources)
- property n_mics¶
- property n_sources¶
- plot(img_order=None, freq=None, figsize=None, no_axis=False, mic_marker_size=10, plot_directivity=True, ax=None, **kwargs)¶
Plots the room with its walls, microphones, sources and images
- plot_rir(select=None, FD=False, kind=None)¶
Plot room impulse responses. Compute if not done already.
- Parameters
select (list of tuples OR int) – List of RIR pairs (mic, src) to plot, e.g. [(0,0), (0,1)]. Or int to plot RIR from particular microphone to all sources. Note that microphones and sources are zero-indexed. Default is to plot all microphone-source pairs.
FD (bool, optional) – If True, the transfer function is plotted instead of the impulse response. Default is False.
kind (str, optional) – The value can be “ir”, “tf”, or “spec” which will plot impulse response, transfer function, and spectrogram, respectively. If this option is specified, then the value of
FD
is ignored. Default is “ir”.
- Returns
fig (matplotlib figure) – Figure object for further modifications
axes (matplotlib list of axes objects) – Axes for further modifications
- ray_tracing()¶
- rt60_theory(formula='sabine')¶
Compute the theoretical reverberation time (RT60) for the room.
- Parameters
formula (str) – The formula to use for the calculation, ‘sabine’ (default) or ‘eyring’
- set_air_absorption(coefficients=None)¶
Activates or deactivates air absorption in the simulation.
- Parameters
coefficients (list of float) – List of air absorption coefficients, one per octave band
- set_ray_tracing(n_rays=None, receiver_radius=0.5, energy_thres=1e-07, time_thres=10.0, hist_bin_size=0.004)¶
Activates the ray tracer.
- Parameters
n_rays (int, optional) – The number of rays to shoot in the simulation
receiver_radius (float, optional) – The radius of the sphere around the microphone in which to integrate the energy (default: 0.5 m)
energy_thres (float, optional) – The energy thresold at which rays are stopped (default: 1e-7)
time_thres (float, optional) – The maximum time of flight of rays (default: 10 s)
hist_bin_size (float) – The time granularity of bins in the energy histogram (default: 4 ms)
- set_sound_speed(c)¶
Sets the speed of sound unconditionnaly
- simulate(snr=None, reference_mic=0, callback_mix=None, callback_mix_kwargs={}, return_premix=False, recompute_rir=False)¶
Simulates the microphone signal at every microphone in the array
- Parameters
reference_mic (int, optional) – The index of the reference microphone to use for SNR computations. The default reference microphone is the first one (index 0)
snr (float, optional) –
The target signal-to-noise ratio (SNR) in decibels at the reference microphone. When this option is used the argument
pyroomacoustics.room.Room.sigma2_awgn
is ignored. The variance of every source at the reference microphone is normalized to one and the variance of the noise \(\sigma_n^2\) is chosen\[\mathsf{SNR} = 10 \log_{10} \frac{ K }{ \sigma_n^2 }\]The value of
pyroomacoustics.room.Room.sigma2_awgn
is also set to \(\sigma_n^2\) automaticallycallback_mix (func, optional) – A function that will perform the mix, it takes as first argument an array of shape
(n_sources, n_mics, n_samples)
that contains the source signals convolved with the room impulse response prior to mixture at the microphone. It should return an array of shape(n_mics, n_samples)
containing the mixed microphone signals. If such a function is provided, thesnr
option is ignored andpyroomacoustics.room.Room.sigma2_awgn
is set toNone
.callback_mix_kwargs (dict, optional) – A dictionary that contains optional arguments for
callback_mix
functionreturn_premix (bool, optional) – If set to
True
, the function will return an array of shape(n_sources, n_mics, n_samples)
containing the microphone signals with individual sources, convolved with the room impulse response but prior to mixingrecompute_rir (bool, optional) – If set to
True
, the room impulse responses will be recomputed prior to simulation
- Returns
Depends on the value of
return_premix
option- Return type
Nothing or an array of shape
(n_sources, n_mics, n_samples)
- unset_air_absorption()¶
Deactivates air absorption in the simulation
- unset_ray_tracing()¶
Deactivates the ray tracer
- property volume¶
- wall_area(wall)¶
Computes the area of a 3D planar wall.
- Parameters
wall (Wall instance) – the wall object that is defined in 3D space
- class pyroomacoustics.room.ShoeBox(p, fs=8000, t0=0.0, absorption=None, max_order=1, sigma2_awgn=None, sources=None, mics=None, materials=None, temperature=None, humidity=None, air_absorption=False, ray_tracing=False, use_rand_ism=False, max_rand_disp=0.08)¶
Bases:
Room
This class provides an API for creating a ShoeBox room in 2D or 3D.
- Parameters
p (array_like) – Length 2 (width, length) or 3 (width, length, height) depending on the desired dimension of the room.
fs (int, optional) – The sampling frequency in Hz. Default is 8000.
t0 (float, optional) – The global starting time of the simulation in seconds. Default is 0.
absorption (float) – Average amplitude absorption of walls. Note that this parameter is deprecated; use materials instead!
max_order (int, optional) – The maximum reflection order in the image source model. Default is 1, namely direct sound and first order reflections.
sigma2_awgn (float, optional) – The variance of the additive white Gaussian noise added during simulation. By default, none is added.
sources (list of SoundSource objects, optional) – Sources to place in the room. Sources can be added after room creating with the add_source method by providing coordinates.
mics (MicrophoneArray object, optional) – The microphone array to place in the room. A single microphone or microphone array can be added after room creation with the add_microphone_array method.
materials (Material object or dict of Material objects) – See pyroomacoustics.parameters.Material. If providing a dict, you must provide a Material object for each wall: ‘east’, ‘west’, ‘north’, ‘south’, ‘ceiling’ (3D), ‘floor’ (3D).
temperature (float, optional) – The air temperature in the room in degree Celsius. By default, set so that speed of sound is 343 m/s.
humidity (float, optional) – The relative humidity of the air in the room (between 0 and 100). By default set to 0.
air_absorption (bool, optional) – If set to True, absorption of sound energy by the air will be simulated.
ray_tracing (bool, optional) – If set to True, the ray tracing simulator will be used along with image source model.
use_rand_ism (bool, optional) – If set to True, image source positions will have a small random displacement to prevent sweeping echoes
max_rand_disp (float, optional;) – If using randomized image source method, what is the maximum displacement of the image sources?
- extrude(height)¶
Overload the extrude method from 3D rooms
- get_volume()¶
Computes the volume of a room
- Return type
the volume in cubic unit
- is_inside(pos)¶
- Parameters
pos (array_like) – The position to test in an array of size 2 for a 2D room and 3 for a 3D room
- Return type
True if
pos
is a point in the room,False
otherwise.
- pyroomacoustics.room.find_non_convex_walls(walls)¶
Finds the walls that are not in the convex hull
- Parameters
walls (list of Wall objects) – The walls that compose the room
- Returns
The indices of the walls no in the convex hull
- Return type
list of int
- pyroomacoustics.room.sequence_generation(volume, duration, c, fs, max_rate=10000)¶
- pyroomacoustics.room.wall_factory(corners, absorption, scattering, name='')¶
Call the correct method according to wall dimension
Materials Database¶
Absorption Coefficients¶
Massive constructions and hard surfaces¶
keyword |
description |
125 Hz |
250 Hz |
500 Hz |
1 kHz |
2 kHz |
4 kHz |
8 kHz |
---|---|---|---|---|---|---|---|---|
hard_surface |
Walls, hard surfaces average (brick walls, plaster, hard floors, etc.) |
0.02 |
0.02 |
0.03 |
0.03 |
0.04 |
0.05 |
0.05 |
brickwork |
Walls, rendered brickwork |
0.01 |
0.02 |
0.02 |
0.03 |
0.03 |
0.04 |
0.04 |
rough_concrete |
Rough concrete |
0.02 |
0.03 |
0.03 |
0.03 |
0.04 |
0.07 |
0.07 |
unpainted_concrete |
Smooth unpainted concrete |
0.01 |
0.01 |
0.02 |
0.02 |
0.02 |
0.05 |
0.05 |
rough_lime_wash |
Rough lime wash |
0.02 |
0.03 |
0.04 |
0.05 |
0.04 |
0.03 |
0.02 |
smooth_brickwork_flush_pointing |
Smooth brickwork with flush pointing, painted |
0.01 |
0.01 |
0.02 |
0.02 |
0.02 |
0.02 |
0.02 |
smooth_brickwork_10mm_pointing |
Smooth brickwork, 10 mm deep pointing, pit sand mortar |
0.08 |
0.09 |
0.12 |
0.16 |
0.22 |
0.24 |
0.24 |
brick_wall_rough |
Brick wall, stuccoed with a rough finish |
0.03 |
0.03 |
0.03 |
0.04 |
0.05 |
0.07 |
0.07 |
ceramic_tiles |
Ceramic tiles with a smooth surface |
0.01 |
0.01 |
0.01 |
0.02 |
0.02 |
0.02 |
0.02 |
limestone_wall |
Limestone walls |
0.02 |
0.02 |
0.03 |
0.04 |
0.05 |
0.05 |
0.05 |
reverb_chamber |
Reverberation chamber walls |
0.01 |
0.01 |
0.01 |
0.02 |
0.02 |
0.04 |
0.04 |
concrete_floor |
Concrete floor |
0.01 |
0.03 |
0.05 |
0.02 |
0.02 |
0.02 |
0.02 |
marble_floor |
Marble floor |
0.01 |
0.01 |
0.01 |
0.02 |
0.02 |
0.02 |
0.02 |
Lightweight constructions and linings¶
keyword |
description |
125 Hz |
250 Hz |
500 Hz |
1 kHz |
2 kHz |
4 kHz |
8 kHz |
---|---|---|---|---|---|---|---|---|
plasterboard |
2 * 13 mm plasterboard on steel frame, 50 mm mineral wool in cavity, surface painted |
0.15 |
0.1 |
0.06 |
0.04 |
0.04 |
0.05 |
0.05 |
wooden_lining |
Wooden lining, 12 mm fixed on frame |
0.27 |
0.23 |
0.22 |
0.15 |
0.1 |
0.07 |
0.06 |
Glazing¶
keyword |
description |
125 Hz |
250 Hz |
500 Hz |
1 kHz |
2 kHz |
4 kHz |
8 kHz |
---|---|---|---|---|---|---|---|---|
glass_3mm |
Single pane of glass, 3 mm |
0.08 |
0.04 |
0.03 |
0.03 |
0.02 |
0.02 |
0.02 |
glass_window |
Glass window, 0.68 kg/m2 |
0.1 |
0.05 |
0.04 |
0.03 |
0.03 |
0.03 |
0.03 |
lead_glazing |
Lead glazing |
0.3 |
0.2 |
0.14 |
0.1 |
0.05 |
0.05 |
|
double_glazing_30mm |
Double glazing, 2–3 mm glass, > 30 mm gap |
0.15 |
0.05 |
0.03 |
0.03 |
0.02 |
0.02 |
0.02 |
double_glazing_10mm |
Double glazing, 2–3 mm glass, 10 mm gap |
0.1 |
0.07 |
0.05 |
0.03 |
0.02 |
0.02 |
0.02 |
double_glazing_inside |
Double glazing, lead on the inside |
0.15 |
0.3 |
0.18 |
0.1 |
0.05 |
0.05 |
Wood¶
keyword |
description |
125 Hz |
250 Hz |
500 Hz |
1 kHz |
2 kHz |
4 kHz |
8 kHz |
---|---|---|---|---|---|---|---|---|
wood_1.6cm |
Wood, 1.6 cm thick, on 4 cm wooden planks |
0.18 |
0.12 |
0.1 |
0.09 |
0.08 |
0.07 |
0.07 |
plywood_thin |
Thin plywood panelling |
0.42 |
0.21 |
0.1 |
0.08 |
0.06 |
0.06 |
|
wood_16mm |
16 mm wood on 40 mm studs |
0.18 |
0.12 |
0.1 |
0.09 |
0.08 |
0.07 |
0.07 |
audience_floor |
Audience floor, 2 layers, 33 mm on sleepers over concrete |
0.09 |
0.06 |
0.05 |
0.05 |
0.05 |
0.04 |
|
stage_floor |
Wood, stage floor, 2 layers, 27 mm over airspace |
0.1 |
0.07 |
0.06 |
0.06 |
0.06 |
0.06 |
|
wooden_door |
Solid wooden door |
0.14 |
0.1 |
0.06 |
0.08 |
0.1 |
0.1 |
0.10 |
Floor coverings¶
keyword |
description |
125 Hz |
250 Hz |
500 Hz |
1 kHz |
2 kHz |
4 kHz |
8 kHz |
---|---|---|---|---|---|---|---|---|
linoleum_on_concrete |
Linoleum, asphalt, rubber, or cork tile on concrete |
0.02 |
0.03 |
0.03 |
0.03 |
0.03 |
0.02 |
|
carpet_cotton |
Cotton carpet |
0.07 |
0.31 |
0.49 |
0.81 |
0.66 |
0.54 |
0.48 |
carpet_tufted_9.5mm |
Loop pile tufted carpet, 1.4 kg/m2, 9.5 mm pile height: On hair pad, 3.0 kg/m2 |
0.1 |
0.4 |
0.62 |
0.7 |
0.63 |
0.88 |
|
carpet_thin |
Thin carpet, cemented to concrete |
0.02 |
0.04 |
0.08 |
0.2 |
0.35 |
0.4 |
|
carpet_6mm_closed_cell_foam |
6 mm pile carpet bonded to closed-cell foam underlay |
0.03 |
0.09 |
0.25 |
0.31 |
0.33 |
0.44 |
0.44 |
carpet_6mm_open_cell_foam |
6 mm pile carpet bonded to open-cell foam underlay |
0.03 |
0.09 |
0.2 |
0.54 |
0.7 |
0.72 |
0.72 |
carpet_tufted_9m |
9 mm tufted pile carpet on felt underlay |
0.08 |
0.08 |
0.3 |
0.6 |
0.75 |
0.8 |
0.80 |
felt_5mm |
Needle felt 5 mm stuck to concrete |
0.02 |
0.02 |
0.05 |
0.15 |
0.3 |
0.4 |
0.40 |
carpet_soft_10mm |
10 mm soft carpet on concrete |
0.09 |
0.08 |
0.21 |
0.26 |
0.27 |
0.37 |
|
carpet_hairy |
Hairy carpet on 3 mm felt |
0.11 |
0.14 |
0.37 |
0.43 |
0.27 |
0.25 |
0.25 |
carpet_rubber_5mm |
5 mm rubber carpet on concrete |
0.04 |
0.04 |
0.08 |
0.12 |
0.1 |
0.1 |
|
carpet_1.35_kg_m2 |
Carpet 1.35 kg/m2, on hair felt or foam rubber |
0.08 |
0.24 |
0.57 |
0.69 |
0.71 |
0.73 |
|
cocos_fibre_roll_29mm |
Cocos fibre roll felt, 29 mm thick (unstressed), reverse side clad with paper, 2.2 kg/m2, 2 Rayl |
0.1 |
0.13 |
0.22 |
0.35 |
0.47 |
0.57 |
Curtains¶
keyword |
description |
125 Hz |
250 Hz |
500 Hz |
1 kHz |
2 kHz |
4 kHz |
8 kHz |
---|---|---|---|---|---|---|---|---|
curtains_cotton_0.5 |
Cotton curtains (0.5 kg/m2) draped to 3/4 area approx. 130 mm from wall |
0.3 |
0.45 |
0.65 |
0.56 |
0.59 |
0.71 |
0.71 |
curtains_0.2 |
Curtains (0.2 kg/m2) hung 90 mm from wall |
0.05 |
0.06 |
0.39 |
0.63 |
0.7 |
0.73 |
0.73 |
curtains_cotton_0.33 |
Cotton cloth (0.33 kg/m2) folded to 7/8 area |
0.03 |
0.12 |
0.15 |
0.27 |
0.37 |
0.42 |
|
curtains_densely_woven |
Densely woven window curtains 90 mm from wall |
0.06 |
0.1 |
0.38 |
0.63 |
0.7 |
0.73 |
|
blinds_half_open |
Vertical blinds, 15 cm from wall, half opened (45°) |
0.03 |
0.09 |
0.24 |
0.46 |
0.79 |
0.76 |
|
blinds_open |
Vertical blinds, 15 cm from wall, open (90°) |
0.03 |
0.06 |
0.13 |
0.28 |
0.49 |
0.56 |
|
curtains_velvet |
Tight velvet curtains |
0.05 |
0.12 |
0.35 |
0.45 |
0.38 |
0.36 |
0.36 |
curtains_fabric |
Curtain fabric, 15 cm from wall |
0.1 |
0.38 |
0.63 |
0.52 |
0.55 |
0.65 |
|
curtains_fabric_folded |
Curtain fabric, folded, 15 cm from wall |
0.12 |
0.6 |
0.98 |
1 |
1 |
1 |
1.00 |
curtains_glass_mat |
Curtains of close-woven glass mat hung 50 mm from wall |
0.03 |
0.03 |
0.15 |
0.4 |
0.5 |
0.5 |
0.50 |
studio_curtains |
Studio curtains, 22 cm from wall |
0.36 |
0.26 |
0.51 |
0.45 |
0.62 |
0.76 |
Seating (2 seats per m2)¶
keyword |
description |
125 Hz |
250 Hz |
500 Hz |
1 kHz |
2 kHz |
4 kHz |
8 kHz |
---|---|---|---|---|---|---|---|---|
chairs_wooden |
Wooden chairs without cushion |
0.05 |
0.08 |
0.1 |
0.12 |
0.12 |
0.12 |
|
chairs_unoccupied_plastic |
Unoccupied plastic chairs |
0.06 |
0.1 |
0.1 |
0.2 |
0.3 |
0.2 |
0.20 |
chairs_medium_upholstered |
Medium upholstered concert chairs, empty |
0.49 |
0.66 |
0.8 |
0.88 |
0.82 |
0.7 |
|
chairs_heavy_upholstered |
Heavily upholstered seats, unoccupied |
0.7 |
0.76 |
0.81 |
0.84 |
0.84 |
0.81 |
|
chairs_upholstered_cloth |
Empty chairs, upholstered with cloth cover |
0.44 |
0.6 |
0.77 |
0.89 |
0.82 |
0.7 |
0.70 |
chairs_upholstered_leather |
Empty chairs, upholstered with leather cover |
0.4 |
0.5 |
0.58 |
0.61 |
0.58 |
0.5 |
0.50 |
chairs_upholstered_moderate |
Unoccupied, moderately upholstered chairs (0.90 m × 0.55 m) |
0.44 |
0.56 |
0.67 |
0.74 |
0.83 |
0.87 |
Audience (unless not specified explicitly, 2 persons per m2)¶
keyword |
description |
125 Hz |
250 Hz |
500 Hz |
1 kHz |
2 kHz |
4 kHz |
8 kHz |
---|---|---|---|---|---|---|---|---|
audience_orchestra_choir |
Areas with audience, orchestra or choir including narrow aisles |
0.6 |
0.74 |
0.88 |
0.96 |
0.93 |
0.85 |
0.85 |
audience_wooden_chairs_1_m2 |
Audience on wooden chairs, 1 per m2 |
0.16 |
0.24 |
0.56 |
0.69 |
0.81 |
0.78 |
0.78 |
audience_wooden_chairs_2_m2 |
Audience on wooden chairs, 2 per m2 |
0.24 |
0.4 |
0.78 |
0.98 |
0.96 |
0.87 |
0.87 |
orchestra_1.5_m2 |
Orchestra with instruments on podium, 1.5 m2 per person |
0.27 |
0.53 |
0.67 |
0.93 |
0.87 |
0.8 |
0.80 |
audience_0.72_m2 |
Audience area, 0.72 persons / m2 |
0.1 |
0.21 |
0.41 |
0.65 |
0.75 |
0.71 |
|
audience_1_m2 |
Audience area, 1 person / m2 |
0.16 |
0.29 |
0.55 |
0.8 |
0.92 |
0.9 |
|
audience_1.5_m2 |
Audience area, 1.5 persons / m2 |
0.22 |
0.38 |
0.71 |
0.95 |
0.99 |
0.99 |
|
audience_2_m2 |
Audience area, 2 persons / m2 |
0.26 |
0.46 |
0.87 |
0.99 |
0.99 |
0.99 |
|
audience_upholstered_chairs_1 |
Audience in moderately upholstered chairs 0,85 m × 0,63 m |
0.72 |
0.82 |
0.91 |
0.93 |
0.94 |
0.87 |
|
audience_upholstered_chairs_2 |
Audience in moderately upholstered chairs 0,90 m × 0,55 m |
0.55 |
0.86 |
0.83 |
0.87 |
0.9 |
0.87 |
Wall absorbers¶
keyword |
description |
125 Hz |
250 Hz |
500 Hz |
1 kHz |
2 kHz |
4 kHz |
8 kHz |
---|---|---|---|---|---|---|---|---|
panel_fabric_covered_6pcf |
Fabric-covered panel, 6 pcf rockwool core |
0.46 |
0.93 |
1 |
1 |
1 |
1 |
1.00 |
panel_fabric_covered_8pcf |
Fabric-covered panel, 8 pcf rockwool core |
0.21 |
0.66 |
1 |
1 |
0.97 |
0.98 |
0.98 |
facing_brick |
Facing-brick brickwork, open butt joins, brick dimensions 230 × 50 × 55 mm |
0.04 |
0.14 |
0.49 |
0.35 |
0.31 |
0.36 |
|
acoustical_plaster_25mm |
Acoustical plaster, approx. 25 mm thick, 3.5 kg/m2/cm |
0.17 |
0.36 |
0.66 |
0.65 |
0.62 |
0.68 |
|
rockwool_50mm_80kgm3 |
Rockwool thickness = 50 mm, 80 kg/m3 |
0.22 |
0.6 |
0.92 |
0.9 |
0.88 |
0.88 |
0.88 |
rockwool_50mm_40kgm3 |
Rockwool thickness = 50 mm, 40 kg/m3 |
0.23 |
0.59 |
0.86 |
0.86 |
0.86 |
0.86 |
0.86 |
mineral_wool_50mm_40kgm3 |
50 mm mineral wool (40 kg/m3), glued to wall, untreated surface |
0.15 |
0.7 |
0.6 |
0.6 |
0.85 |
0.9 |
0.90 |
mineral_wool_50mm_70kgm3 |
50 mm mineral wool (70 kg/m3) 300 mm in front of wall |
0.7 |
0.45 |
0.65 |
0.6 |
0.75 |
0.65 |
0.65 |
gypsum_board |
Gypsum board, perforation 19.6%, hole diameter 15 mm, backed by fibrous web 12 Rayl, 100 mm cavity filled with mineral fibre mat 1,05 kg/m2, 7,5 Rayl |
0.3 |
0.69 |
1 |
0.81 |
0.66 |
0.62 |
|
perforated_veneered_chipboard |
Perforated veneered chipboard, 50 mm, 1 mm holes, 3 mm spacing, 9% hole surface ratio, 150 mm cavity filled with 30 mm mineral wool |
0.41 |
0.67 |
0.58 |
0.59 |
0.68 |
0.35 |
|
fibre_absorber_1 |
Fibre absorber, mineral fibre, 20 mm thick, 3.4 kg/m2, 50 mm cavity |
0.2 |
0.56 |
0.82 |
0.87 |
0.7 |
0.53 |
|
fibre_absorber_2 |
Fibre absorber, mats of porous flexible fibrous web fabric, self-extinguishing |
0.07 |
0.07 |
0.2 |
0.41 |
0.75 |
0.97 |
Ceiling absorbers¶
keyword |
description |
125 Hz |
250 Hz |
500 Hz |
1 kHz |
2 kHz |
4 kHz |
8 kHz |
---|---|---|---|---|---|---|---|---|
ceiling_plasterboard |
Plasterboard ceiling on battens with large air-space above |
0.2 |
0.15 |
0.1 |
0.08 |
0.04 |
0.02 |
|
ceiling_fibre_abosrber |
Fibre absorber on perforated sheet metal cartridge, 0,5 mm zinc-plated steel, 1.5 mm hole diameter, 200 mm cavity filled with 20 mm mineral wool (20 kg/m3), inflammable |
0.48 |
0.97 |
1 |
0.97 |
1 |
1 |
1.00 |
ceiling_fissured_tile |
Fissured ceiling tile |
0.49 |
0.53 |
0.53 |
0.75 |
0.92 |
0.99 |
|
ceiling_perforated_gypsum_board |
Perforated 27 mm gypsum board (16%), d = 4,5 mm, 300 mm from ceiling |
0.45 |
0.55 |
0.6 |
0.9 |
0.86 |
0.75 |
|
ceiling_melamine_foam |
Wedge-shaped, melamine foam, ceiling tile |
0.12 |
0.33 |
0.83 |
0.97 |
0.98 |
0.95 |
|
ceiling_metal_panel |
Metal panel ceiling, backed by 20 mm Sillan acoustic tiles, panel width 85 mm, panel spacing 15 mm, cavity 35 cm |
0.59 |
0.8 |
0.82 |
0.65 |
0.27 |
0.23 |
Special absorbers¶
keyword |
description |
125 Hz |
250 Hz |
500 Hz |
1 kHz |
2 kHz |
4 kHz |
8 kHz |
---|---|---|---|---|---|---|---|---|
microperforated_foil_kaefer |
Microperforated foil “Microsorber” (Kaefer) |
0.06 |
0.28 |
0.7 |
0.68 |
0.74 |
0.53 |
|
microperforated_glass_sheets |
Microperforated glass sheets, 5 mm cavity |
0.1 |
0.45 |
0.85 |
0.3 |
0.1 |
0.05 |
|
hanging_absorber_panels_1 |
Hanging absorber panels (foam), 400 mm depth, 400 mm distance |
0.25 |
0.45 |
0.8 |
0.9 |
0.85 |
0.8 |
|
hanging_absorber_panels_2 |
Hanging absorber panels (foam), 400 mm depth, 700 mm distance |
0.2 |
0.3 |
0.6 |
0.75 |
0.7 |
0.7 |
Scattering Coefficients¶
Diffusers¶
keyword |
description |
125 Hz |
250 Hz |
500 Hz |
1 kHz |
2 kHz |
4 kHz |
8 kHz |
---|---|---|---|---|---|---|---|---|
rpg_skyline |
Diffuser RPG Skyline |
0.01 |
0.08 |
0.45 |
0.82 |
1 |
||
rpg_qrd |
Diffuser RPG QRD |
0.06 |
0.15 |
0.45 |
0.95 |
0.88 |
0.91 |
Seating and audience¶
keyword |
description |
125 Hz |
250 Hz |
500 Hz |
1 kHz |
2 kHz |
4 kHz |
8 kHz |
---|---|---|---|---|---|---|---|---|
theatre_audience |
Theatre Audience |
0.3 |
0.5 |
0.6 |
0.6 |
0.7 |
0.70 |
0.70 |
classroom_tables |
Rows of classroom tables and persons on chairs |
0.2 |
0.3 |
0.4 |
0.5 |
0.5 |
0.60 |
0.60 |
amphitheatre_steps |
Amphitheatre steps, length 82 cm, height 30 cm (Farnetani 2005) |
0.05 |
0.45 |
0.75 |
0.9 |
0.9 |
Round Robin III – wall and ceiling¶
keyword |
description |
125 Hz |
250 Hz |
500 Hz |
1 kHz |
2 kHz |
4 kHz |
8 kHz |
---|---|---|---|---|---|---|---|---|
rect_prism_boxes |
Rectangularandprismboxes (studio wall) “Round Robin III” (after (Bork 2005a)) |
0.5 |
0.9 |
0.95 |
0.95 |
0.95 |
0.95 |
|
trapezoidal_boxes |
Trapezoidal boxes (studio ceiling) “Round Robin III” (after (Bork 2005a)) |
0.13 |
0.56 |
0.95 |
0.95 |
0.95 |
0.95 |
Transforms¶
Module contents¶
Block based processing¶
This subpackage contains routines for continuous realtime-like block processing of signals with the STFT. The routines are written to take advantage of fast FFT libraries like pyfftw or mkl when available.
- DFT
- A class for performing DFT of real signals
- STFT
- A class for continuous STFT processing and frequency domain filtering
Algorithms¶
DFT¶
Class for performing the Discrete Fourier Transform (DFT) and inverse DFT for real signals, including multichannel. It is also possible to specific an analysis or synthesis window.
When available, it is possible to use the pyfftw
or mkl_fft
packages.
Otherwise the default is to use numpy.fft.rfft
/numpy.fft.irfft
.
More on pyfftw
can be read here:
https://pyfftw.readthedocs.io/en/latest/index.html
More on mkl_fft
can be read here:
https://github.com/IntelPython/mkl_fft
- class pyroomacoustics.transform.dft.DFT(nfft, D=1, analysis_window=None, synthesis_window=None, transform='numpy', axis=0, precision='double', bits=None)¶
Bases:
object
Class for performing the Discrete Fourier Transform (DFT) of real signals.
- X¶
Real DFT computed by
analysis
.- Type
numpy array
- x¶
IDFT computed by
synthesis
.- Type
numpy array
- nfft¶
FFT size.
- Type
int
- D¶
Number of channels.
- Type
int
- transform¶
Which FFT package will be used.
- Type
str
- analysis_window¶
Window to be applied before DFT.
- Type
numpy array
- synthesis_window¶
Window to be applied after inverse DFT.
- Type
numpy array
- axis¶
Axis over which to compute the FFT.
- Type
int
- precision, bits
How many precision bits to use for the input. Twice the amount will be used for complex spectrum.
- Type
string, np.float32, np.float64, np.complex64, np.complex128
- Parameters
nfft (int) – FFT size.
D (int, optional) – Number of channels. Default is 1.
analysis_window (numpy array, optional) – Window to be applied before DFT. Default is no window.
synthesis_window (numpy array, optional) – Window to be applied after inverse DFT. Default is no window.
transform (str, optional) – which FFT package to use:
numpy
,pyfftw
, ormkl
. Default isnumpy
.axis (int, optional) – Axis over which to compute the FFT. Default is first axis.
precision (string, np.float32, np.float64, np.complex64, np.complex128, optional) – How many precision bits to use for the input. If ‘single’/np.float32/np.complex64, 32 bits for real inputs or 64 for complex spectrum. Otherwise, cast to 64 bits for real inputs or 128 for complex spectrum (default).
- analysis(x)¶
Perform frequency analysis of a real input using DFT.
- Parameters
x (numpy array) – Real signal in time domain.
- Returns
DFT of input.
- Return type
numpy array
- synthesis(X=None)¶
Perform time synthesis of frequency domain to real signal using the inverse DFT.
- Parameters
X (numpy array, optional) – Complex signal in frequency domain. Default is to use DFT computed from
analysis
.- Returns
IDFT of
self.X
or input if given.- Return type
numpy array
STFT¶
Class for real-time STFT analysis and processing.
- class pyroomacoustics.transform.stft.STFT(N, hop=None, analysis_window=None, synthesis_window=None, channels=1, transform='numpy', streaming=True, precision='double', **kwargs)¶
Bases:
object
A class for STFT processing.
- Parameters
N (int) – number of samples per frame
hop (int) – hop size
analysis_window (numpy array) – window applied to block before analysis
synthesis_window (numpy array) – window applied to the block before synthesis
channels (int) – number of signals
transform (str, optional) – which FFT package to use: ‘numpy’ (default), ‘pyfftw’, or ‘mkl’
streaming (bool, optional) – whether (True, default) or not (False) to “stitch” samples between repeated calls of ‘analysis’ and ‘synthesis’ if we are receiving a continuous stream of samples.
num_frames (int, optional) –
Number of frames to be processed. If set, this will be strictly enforced as the STFT block will allocate memory accordingly. If not set, there will be no check on the number of frames sent to analysis/process/synthesis
- NOTE:
1) num_frames = 0, corresponds to a “real-time” case in which each input block corresponds to [hop] samples. 2) num_frames > 0, requires [(num_frames-1)*hop + N] samples as the last frame must contain [N] samples.
precision (string, np.float32, np.float64, np.complex64, np.complex128, optional) – How many precision bits to use for the input. If ‘single’/np.float32/np.complex64, 32 bits for real inputs or 64 for complex spectrum. Otherwise, cast to 64 bits for real inputs or 128 for complex spectrum (default).
- analysis(x)¶
- Parameters
x (2D numpy array, [samples, channels]) – Time-domain signal.
- process(X=None)¶
- Parameters
X (numpy array) – X can take on multiple shapes: 1) (N,) if it is single channel and only one frame 2) (N,D) if it is multi-channel and only one frame 3) (F,N) if it is single channel but multiple frames 4) (F,N,D) if it is multi-channel and multiple frames
- Returns
x_r – Reconstructed time-domain signal.
- Return type
numpy array
- reset()¶
Reset state variables. Necessary after changing or setting the filter or zero padding.
- set_filter(coeff, zb=None, zf=None, freq=False)¶
Set time-domain FIR filter with appropriate zero-padding. Frequency spectrum of the filter is computed and set for the object. There is also a check for sufficient zero-padding.
- Parameters
coeff (numpy array) – Filter in time domain.
zb (int) – Amount of zero-padding added to back/end of frame.
zf (int) – Amount of zero-padding added to front/beginning of frame.
freq (bool) – Whether or not given coefficients (coeff) are in the frequency domain.
- synthesis(X=None)¶
- Parameters
X (numpy array of frequency content) – X can take on multiple shapes: 1) (N,) if it is single channel and only one frame 2) (N,D) if it is multi-channel and only one frame 3) (F,N) if it is single channel but multiple frames 4) (F,N,D) if it is multi-channel and multiple frames where: - F is the number of frames - N is the number of frequency bins - D is the number of channels
- Returns
x_r – Reconstructed time-domain signal.
- Return type
numpy array
- zero_pad_back(zb)¶
Set zero-padding at end of frame.
- zero_pad_front(zf)¶
Set zero-padding at beginning of frame.
- pyroomacoustics.transform.stft.analysis(x, L, hop, win=None, zp_back=0, zp_front=0)¶
Convenience function for one-shot STFT
- Parameters
x (array_like, (n_samples) or (n_samples, n_channels)) – input signal
L (int) – frame size
hop (int) – shift size between frames
win (array_like) – the window to apply (default None)
zp_back (int) – zero padding to apply at the end of the frame
zp_front (int) – zero padding to apply at the beginning of the frame
- Returns
X – The STFT of x
- Return type
ndarray, (n_frames, n_frequencies) or (n_frames, n_frequencies, n_channels)
- pyroomacoustics.transform.stft.compute_synthesis_window(analysis_window, hop)¶
Computes the optimal synthesis window given an analysis window and hop (frame shift). The procedure is described in
D. Griffin and J. Lim, Signal estimation from modified short-time Fourier transform, IEEE Trans. Acoustics, Speech, and Signal Process., vol. 32, no. 2, pp. 236-243, 1984.
- Parameters
analysis_window (array_like) – The analysis window
hop (int) – The frame shift
- pyroomacoustics.transform.stft.synthesis(X, L, hop, win=None, zp_back=0, zp_front=0)¶
Convenience function for one-shot inverse STFT
- Parameters
X (array_like (n_frames, n_frequencies) or (n_frames, n_frequencies, n_channels)) – The data
L (int) – frame size
hop (int) – shift size between frames
win (array_like) – the window to apply (default None)
zp_back (int) – zero padding to apply at the end of the frame
zp_front (int) – zero padding to apply at the beginning of the frame
- Returns
x – The inverse STFT of X
- Return type
ndarray, (n_samples) or (n_samples, n_channels)
Dataset Wrappers¶
Module contents¶
The Datasets Sub-package is responsible to deliver wrappers around a few popular audio datasets to make them easier to use.
Two base class pyroomacoustics.datasets.base.Dataset
and
pyroomacoustics.datasets.base.Sample
wrap
together the audio samples and their meta data.
The general idea is to create a sample object with an attribute
containing all metadata. Dataset objects that have a collection
of samples can then be created and can be filtered according
to the values in the metadata.
Many of the functions with match
or filter
will take an
arbitrary number of keyword arguments. The keys should match some
metadata in the samples. Then there are three ways that match occurs
between a key/value
pair and an attribute
sharing the same key.
value == attribute
value
is a list andattribute in value == True
value
is a callable (a function) andvalue(attribute) == True
Example 1¶
# Prepare a few artificial samples
samples = [
{
'data' : 0.99,
'metadata' : { 'speaker' : 'alice', 'sex' : 'female', 'age' : 37, 'number' : 'one' },
},
{
'data' : 2.1,
'metadata' : { 'speaker' : 'alice', 'sex' : 'female', 'age' : 37, 'number' : 'two' },
},
{
'data' : 1.02,
'metadata' : { 'speaker' : 'bob', 'sex' : 'male', 'age' : 48, 'number' : 'one' },
},
{
'data' : 2.07,
'metadata' : { 'speaker' : 'bob', 'sex' : 'male', 'age' : 48, 'number' : 'two' },
},
]
corpus = Dataset()
for s in samples:
new_sample = Sample(s['data'], **s['metadata'])
corpus.add_sample(new_sample)
# Then, it possible to display summary info about the corpus
print(corpus)
# The number of samples in the corpus is given by ``len``
print('Number of samples:', len(corpus))
# And we can access samples with the slice operator
print('Sample #2:')
print(corpus[2]) # (shortcut for `corpus.samples[2]`)
# We can obtain a new corpus with only male subject
corpus_male_only = corpus.filter(sex='male')
print(corpus_male_only)
# Only retain speakers above 40 years old
corpus_older = corpus.filter(age=lambda a : a > 40)
print(corpus_older)
Example 2 (CMU ARCTIC)¶
# This example involves the CMU ARCTIC corpus available at
# http://www.festvox.org/cmu_arctic/
import matplotlib.pyplot as plt
import pyroomacoustics as pra
# Here, the corpus for speaker bdl is automatically downloaded
# if it is not available already
corpus = pra.datasets.CMUArcticCorpus(download=True, speaker=['bdl'])
# print dataset info and 10 sentences
print(corpus)
corpus.head(n=10)
# let's extract all samples containing the word 'what'
keyword = 'what'
matches = corpus.filter(text=lambda t : keyword in t)
print('The number of sentences containing "{}": {}'.format(keyword, len(matches)))
for s in matches.sentences:
print(' *', s)
# if the sounddevice package is available, we can play the sample
matches[0].play()
# show the spectrogram
plt.figure()
matches[0].plot()
plt.show()
Example 3 (Google’s Speech Commands Dataset)¶
# This example involves Google's Speech Commands Dataset available at
# https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html
import matplotlib.pyplot as plt
import pyroomacoustics as pra
# The dataset is automatically downloaded if not available and 10 of each word is selected
dataset = pra.datasets.GoogleSpeechCommands(download=True, subset=10, seed=0)
# print dataset info, first 10 entries, and all sounds
print(dataset)
dataset.head(n=10)
print("All sounds in the dataset:")
print(dataset.classes)
# filter by specific word
selected_word = 'yes'
matches = dataset.filter(word=selected_word)
print("Number of '%s' samples : %d" % (selected_word, len(matches)))
# if the sounddevice package is available, we can play the sample
matches[0].play()
# show the spectrogram
plt.figure()
matches[0].plot()
plt.show()
Datasets Available¶
CMU ARCTIC Corpus¶
The CMU ARCTIC Dataset¶
The CMU_ARCTIC databases were constructed at the Language Technologies Institute at Carnegie Mellon University as phonetically balanced, US English single speaker databases designed for unit selection speech synthesis research. A detailed report on the structure and content of the database and the recording environment etc is available as a Carnegie Mellon University, Language Technologies Institute Tech Report CMU-LTI-03-177 and is also available here.
The databases consist of around 1150 utterances carefully selected from out-of-copyright texts from Project Gutenberg. The databses include US English male (bdl) and female (slt) speakers (both experinced voice talent) as well as other accented speakers.
The 1132 sentence prompt list is available from cmuarctic.data
The distributions include 16KHz waveform and simultaneous EGG signals. Full phoentically labelling was perfromed by the CMU Sphinx using the FestVox based labelling scripts. Complete runnable Festival Voices are included with the database distributions, as examples though better voices can be made by improving labelling etc.
License: Permissive, attribution required
Price: Free
- class pyroomacoustics.datasets.cmu_arctic.CMUArcticCorpus(basedir=None, download=False, build=True, **kwargs)¶
Bases:
Dataset
This class will load the CMU ARCTIC corpus in a structure amenable to be processed.
- basedir¶
The directory where the CMU ARCTIC corpus is located/downloaded. By default, this is the current directory.
- Type
str, option
- info¶
A dictionary whose keys are the labels of metadata fields attached to the samples. The values are lists of all distinct values the field takes.
- Type
dict
- sentences¶
The list of all utterances in the corpus
- Type
list of CMUArcticSentence
- Parameters
basedir (str, optional) – The directory where the CMU ARCTIC corpus is located/downloaded. By default, this is the current directory.
download (bool, optional) – If the corpus does not exist, download it.
speaker (str or list of str, optional) – A list of the CMU ARCTIC speakers labels. If provided, only those speakers are loaded. By default, all speakers are loaded.
sex (str or list of str, optional) – Can be ‘female’ or ‘male’
lang (str or list of str, optional) – The language, only ‘English’ is available here
accent (str of list of str, optional) – The accent of the speaker
- build_corpus(**kwargs)¶
Build the corpus with some filters (sex, lang, accent, sentence_tag, sentence)
- filter(**kwargs)¶
Filter the corpus and selects samples that match the criterias provided The arguments to the keyword can be 1) a string, 2) a list of strings, 3) a function. There is a match if one of the following is True.
value == attribute
value
is a list andattribute in value == True
value
is a callable (a function) andvalue(attribute) == True
- class pyroomacoustics.datasets.cmu_arctic.CMUArcticSentence(path, **kwargs)¶
Bases:
AudioSample
Create the sentence object
- Parameters
path (str) – the path to the audio file
**kwargs – metadata as a list of keyword arguments
- data¶
The actual audio signal
- Type
array_like
- fs¶
sampling frequency
- Type
int
- plot(**kwargs)¶
Plot the spectrogram
Google Speech Commands¶
Google’s Speech Commands Dataset¶
The Speech Commands Dataset has 65,000 one-second long utterances of 30 short words, by thousands of different people, contributed by members of the public through the AIY website. It’s released under a Creative Commons BY 4.0 license.
More info about the dataset can be found at the link below:
https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html
AIY website for contributing recordings:
https://aiyprojects.withgoogle.com/open_speech_recording
Tutorial on creating a word classifier:
https://www.tensorflow.org/versions/master/tutorials/audio_recognition
- class pyroomacoustics.datasets.google_speech_commands.GoogleSample(path, **kwargs)¶
Bases:
AudioSample
Create the sound object.
- Parameters
path (str) – the path to the audio file
**kwargs – metadata as a list of keyword arguments
- data¶
the actual audio signal
- Type
array_like
- fs¶
sampling frequency
- Type
int
- plot(**kwargs)¶
Plot the spectogram
- class pyroomacoustics.datasets.google_speech_commands.GoogleSpeechCommands(basedir=None, download=False, build=True, subset=None, seed=0, **kwargs)¶
Bases:
Dataset
This class will load the Google Speech Commands Dataset in a structure that is convenient to be processed.
- basedir¶
The directory where the Speech Commands Dataset is located/downloaded.
- Type
str
- size_by_samples¶
A dictionary whose keys are the words in the dataset. The values are the number of occurances for that particular word.
- Type
dict
- subdirs¶
The list of subdirectories in
basedir
, where each sound type is the name of a subdirectory.- Type
list
- classes¶
The list of all sounds, same as the keys of
size_by_samples
.- Type
list
- Parameters
basedir (str, optional) – The directory where the Google Speech Commands dataset is located/downloaded. By default, this is the current directory.
download (bool, optional) – If the corpus does not exist, download it.
build (bool, optional) – Whether or not to build the dataset. By default, it is.
subset (int, optional) – Build a dataset that contains all noise samples and subset samples per word. By default, the dataset will be built with all samples.
seed (int, optional) – Which seed to use for the random generator when selecting a subset of samples. By default,
seed=0
.
- build_corpus(subset=None, **kwargs)¶
Build the corpus with some filters (speech or not speech, sound type).
- filter(**kwargs)¶
Filter the dataset and select samples that match the criterias provided The arguments to the keyword can be 1) a string, 2) a list of strings, 3) a function. There is a match if one of the following is True.
value == attribute
value
is a list andattribute in value == True
value
is a callable (a function) andvalue(attribute) == True
TIMIT Corpus¶
The TIMIT Dataset¶
The TIMIT corpus of read speech is designed to provide speech data for acoustic-phonetic studies and for the development and evaluation of automatic speech recognition systems. TIMIT contains broadband recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences. The TIMIT corpus includes time-aligned orthographic, phonetic and word transcriptions as well as a 16-bit, 16kHz speech waveform file for each utterance. Corpus design was a joint effort among the Massachusetts Institute of Technology (MIT), SRI International (SRI) and Texas Instruments, Inc. (TI). The speech was recorded at TI, transcribed at MIT and verified and prepared for CD-ROM production by the National Institute of Standards and Technology (NIST).
The TIMIT corpus transcriptions have been hand verified. Test and training subsets, balanced for phonetic and dialectal coverage, are specified. Tabular computer-searchable information is included as well as written documentation.
Unfortunately, this is a proprietary dataset. A licensed can be obtained for $125 to $250 depending on your status (academic or otherwise).
Deprecation Warning: The interface of TimitCorpus will change in the near
future to match that of pyroomacoustics.datasets.cmu_arctic.CMUArcticCorpus
- class pyroomacoustics.datasets.timit.Sentence(path)¶
Bases:
object
Create the sentence object
- Parameters
path ((string)) – the path to the particular sample
- speaker¶
Speaker initials
- Type
str
- id¶
a digit to disambiguate identical initials
- Type
str
- sex¶
Speaker gender (M or F)
- Type
str
- dialect¶
Speaker dialect region number:
New England
Northern
North Midland
South Midland
Southern
New York City
Western
Army Brat (moved around)
- Type
str
- fs¶
sampling frequency
- Type
int
- samples¶
the audio track
- Type
array_like (n_samples,)
- text¶
the text of the sentence
- Type
str
- words¶
list of Word objects forming the sentence
- Type
list
- phonems¶
List of phonems contained in the sentence. Each element is a dictionnary containing a ‘bnd’ with the limits of the phonem and ‘name’ that is the phonem transcription.
- Type
list
- play()¶
Play the sound sample
- plot(L=512, hop=128, zpb=0, phonems=False, **kwargs)¶
- class pyroomacoustics.datasets.timit.TimitCorpus(basedir)¶
Bases:
object
TimitCorpus class
- Parameters
basedir ((string)) – The location of the TIMIT database
directories ((list of strings)) – The subdirectories containing the data ([‘TEST’,’TRAIN’])
sentence_corpus ((dict)) – A dictionnary that contains a list of Sentence objects for each sub-directory
word_corpus ((dict)) – A dictionnary that contains a list of Words objects for each sub-directory and word available in the corpus
- build_corpus(sentences=None, dialect_region=None, speakers=None, sex=None)¶
Build the corpus
The TIMIT database structure is encoded in the directory sturcture:
- basedir
- TEST/TRAIN
- Regional accent index (1 to 8)
- Speakers (one directory per speaker)
Sentences (one file per sentence)
- Parameters
sentences ((list)) – A list containing the sentences to which we want to restrict the corpus Example: sentences=[‘SA1’,’SA2’]
dialect_region ((list of int)) – A list to which we restrict the dialect regions Example: dialect_region=[1, 4, 5]
speakers ((list)) – A list of speakers acronym to which we want to restrict the corpus Example: speakers=[‘AKS0’]
sex ((string)) – Restrict to a single sex: ‘F’ for female, ‘M’ for male
- get_word(d, w, index=0)¶
return instance index of word w from group (test or train) d
- class pyroomacoustics.datasets.timit.Word(word, boundaries, data, fs, phonems=None)¶
Bases:
object
A class used for words of the TIMIT corpus
- word¶
The spelling of the word
- Type
str
- boundaries¶
The limits of the word within the sentence
- Type
list
- samples¶
A view on the sentence samples containing the word
- Type
array_like
- fs¶
The sampling frequency
- Type
int
- phonems¶
A list of phones contained in the word
- Type
list
- features¶
A feature array (e.g. MFCC coefficients)
- Type
array_like
- Parameters
word (str) – The spelling of the word
boundaries (list) – The limits of the word within the sentence
data (array_like) – The nd-array that contains all the samples of the sentence
fs (int) – The sampling frequency
phonems (list, optional) – A list of phones contained in the word
- mfcc(frame_length=1024, hop=512)¶
compute the mel-frequency cepstrum coefficients of the word samples
- play()¶
Play the sound sample
- plot()¶
Tools and Helpers¶
Base Class¶
Base class for some data corpus and the samples it contains.
- class pyroomacoustics.datasets.base.AudioSample(data, fs, **kwargs)¶
Bases:
Sample
We add some methods specific to display and listen to audio samples. The sampling frequency of the samples is an extra parameter.
For multichannel audio, we assume the same format used by
`scipy.io.wavfile <https://docs.scipy.org/doc/scipy-0.14.0/reference/io.html#module-scipy.io.wavfile>`_
, that isdata
is then a 2D array with each column being a channel.- data¶
The actual data
- Type
array_like
- fs¶
The sampling frequency of the input signal
- Type
int
- meta¶
An object containing the sample metadata. They can be accessed using the dot operator
- Type
pyroomacoustics.datasets.Meta
- play(**kwargs)¶
Play the sound sample. This function uses the sounddevice package for playback.
It takes the same keyword arguments as sounddevice.play.
- plot(NFFT=512, noverlap=384, **kwargs)¶
Plot the spectrogram of the audio sample.
It takes the same keyword arguments as matplotlib.pyplot.specgram.
- class pyroomacoustics.datasets.base.Dataset¶
Bases:
object
The base class for a data corpus. It has basically a list of samples and a filter function
- samples¶
A list of all the Samples in the dataset
- Type
list
- info¶
This dictionary keeps track of all the fields in the metadata. The keys of the dictionary are the metadata field names. The values are again dictionaries, but with the keys being the possible values taken by the metadata and the associated value, the number of samples with this value in the corpus.
- Type
dict
- add_sample(sample)¶
Add a sample to the Dataset and keep track of the metadata.
- add_sample_matching(sample, **kwargs)¶
The sample is added to the corpus only if all the keyword arguments match the metadata of the sample. The match is operated by
pyroomacoustics.datasets.Meta.match
.
- filter(**kwargs)¶
Filter the corpus and selects samples that match the criterias provided The arguments to the keyword can be 1) a string, 2) a list of strings, 3) a function. There is a match if one of the following is True.
value == attribute
value
is a list andattribute in value == True
value
is a callable (a function) andvalue(attribute) == True
- head(n=5)¶
Print n samples from the dataset
- class pyroomacoustics.datasets.base.Meta(**attr)¶
Bases:
object
A simple class that will take a dictionary as input and put the values in attributes named after the keys. We use it to store metadata for the samples
The parameters can be any set of keyword arguments. They will all be transformed into attribute of the object.
- as_dict()¶
Returns all the attribute/value pairs of the object as a dictionary
- match(**kwargs)¶
The key/value pairs given by the keyword arguments are compared to the attribute/value pairs of the object. If the values all match, True is returned. Otherwise False is returned. If a keyword argument has no attribute counterpart, an error is raised. Attributes that do not have a keyword argument counterpart are ignored.
There are three ways to match an attribute with keyword=value: 1.
value == attribute
2.value
is a list andattribute in value == True
3.value
is a callable (a function) andvalue(attribute) == True
- class pyroomacoustics.datasets.base.Sample(data, **kwargs)¶
Bases:
object
The base class for a dataset sample. The idea is that different corpus will have different attributes for the samples. They should at least have a data attribute.
- data¶
The actual data
- Type
array_like
- meta¶
An object containing the sample metadata. They can be accessed using the dot operator
- Type
pyroomacoustics.datasets.Meta
Dataset Utilities¶
- pyroomacoustics.datasets.utils.download_uncompress(url, path='.', compression=None, context=None)¶
This functions download and uncompress on the fly a file of type tar, tar.gz, tar.bz2.
- Parameters
url (str) – The URL of the file
path (str, optional) – The path where to uncompress the file
compression (str, optional) – The compression type (one of ‘bz2’, ‘gz’, ‘tar’), infered from url if not provided
context (SSL certification, optional) – Default is to use none.
Adaptive Filtering¶
Module contents¶
Adaptive Filter Algorithms¶
This sub-package provides implementations of popular adaptive filter algorithms.
- RLS
- Recursive Least Squares
- LMS
- Least Mean Squares and Normalized Least Mean Squares
All these classes derive from the base class
pyroomacoustics.adaptive.adaptive_filter.AdaptiveFilter
that offer
a generic way of running an adaptive filter.
The above classes are applicable for time domain processing. For frequency domain adaptive filtering, there is the SubbandLMS class. After using a DFT or STFT block, the SubbandLMS class can be used to used to apply LMS or NLMS to each frequency band. A shorter adaptive filter can be used on each band as opposed to the filter required in the time domain version. Roughly, a filter of M taps applied to each band (total of B) corresponds to a time domain filter with N = M x B taps.
How to use the adaptive filter module¶
First, an adaptive filter object is created and all the relevant options can be set (step size, regularization, etc). Then, the update function is repeatedly called to provide new samples to the algorithm.
# initialize the filter
rls = pyroomacoustics.adaptive.RLS(30)
# run the filter on a stream of samples
for i in range(100):
rls.update(x[i], d[i])
# the reconstructed filter is available
print('Reconstructed filter:', rls.w)
The SubbandLMS class has the same methods as the time domain approaches. However, the signal must be in the frequency domain. This can be done with the STFT block in the transform sub-package of pyroomacoustics.
# initialize STFT and SubbandLMS blocks
block_size = 128
stft_x = pra.transform.STFT(N=block_size,
hop=block_size//2,
analysis_window=pra.hann(block_size))
stft_d = pra.transform.STFT(N=block_size,
hop=block_size//2,
analysis_window=pra.hann(block_size))
nlms = pra.adaptive.SubbandLMS(num_taps=6,
num_bands=block_size//2+1, mu=0.5, nlms=True)
# preparing input and reference signals
...
# apply block-by-block
for n in range(num_blocks):
# obtain block
...
# to frequency domain
stft_x.analysis(x_block)
stft_d.analysis(d_block)
nlms.update(stft_x.X, stft_d.X)
# estimating input convolved with unknown response
y_hat = stft_d.synthesis(np.diag(np.dot(nlms.W.conj().T,stft_x.X)))
# AEC output
E = stft_d.X - np.diag(np.dot(nlms.W.conj().T,stft_x.X))
out = stft_d.synthesis(E)
Other Available Subpackages¶
pyroomacoustics.adaptive.data_structures
this provides abstractions for computing functions on regular or irregular grids defined on circles and spheres with peak finding methods
pyroomacoustics.adaptive.util
a few methods mainly to efficiently manipulate Toeplitz and Hankel matrices
Utilities¶
pyroomacoustics.adaptive.algorithms
a dictionary containing all the adaptive filter object subclasses availables indexed by keys
['RLS', 'BlockRLS', 'BlockLMS', 'NLMS', 'SubbandLMS']
Algorithms¶
Adaptive Filter (Base)¶
- class pyroomacoustics.adaptive.adaptive_filter.AdaptiveFilter(length)¶
Bases:
object
The dummy base class of an adaptive filter. This class doesn’t compute anything. It merely stores values in a buffer. It is used as a template for all other algorithms.
- name()¶
- reset()¶
Reset the state of the adaptive filter
- update(x_n, d_n)¶
Updates the adaptive filter with a new sample
- Parameters
x_n (float) – the new input sample
d_n (float) – the new noisy reference signal
Least Mean Squares¶
Least Mean Squares Family¶
Implementations of adaptive filters from the LMS class. These algorithms have a low complexity and reliable behavior with a somewhat slower convergence.
- class pyroomacoustics.adaptive.lms.BlockLMS(length, mu=0.01, L=1, nlms=False)¶
Bases:
NLMS
Implementation of the least mean squares algorithm (NLMS) in its block form
- Parameters
length (int) – the length of the filter
mu (float, optional) – the step size (default 0.01)
L (int, optional) – block size (default is 1)
nlms (bool, optional) – whether or not to normalize as in NLMS (default is False)
- reset()¶
Reset the state of the adaptive filter
- update(x_n, d_n)¶
Updates the adaptive filter with a new sample
- Parameters
x_n (float) – the new input sample
d_n (float) – the new noisy reference signal
- class pyroomacoustics.adaptive.lms.NLMS(length, mu=0.5)¶
Bases:
AdaptiveFilter
Implementation of the normalized least mean squares algorithm (NLMS)
- Parameters
length (int) – the length of the filter
mu (float, optional) – the step size (default 0.5)
- update(x_n, d_n)¶
Updates the adaptive filter with a new sample
- Parameters
x_n (float) – the new input sample
d_n (float) – the new noisy reference signal
Recursive Least Squares¶
Recursive Least Squares Family¶
Implementations of adaptive filters from the RLS class. These algorithms typically have a higher computational complexity, but a faster convergence.
- class pyroomacoustics.adaptive.rls.BlockRLS(length, lmbd=0.999, delta=10, dtype=<MagicMock id='140189071318544'>, L=None)¶
Bases:
RLS
Block implementation of the recursive least-squares (RLS) algorithm. The difference with the vanilla implementation is that chunks of the input signals are processed in batch and some savings can be made there.
- Parameters
length (int) – the length of the filter
lmbd (float, optional) – the exponential forgetting factor (default 0.999)
delta (float, optional) – the regularization term (default 10)
dtype (numpy type) – the bit depth of the numpy arrays to use (default np.float32)
L (int, optional) – the block size (default to length)
- reset()¶
Reset the state of the adaptive filter
- update(x_n, d_n)¶
Updates the adaptive filter with a new sample
- Parameters
x_n (float) – the new input sample
d_n (float) – the new noisy reference signal
- class pyroomacoustics.adaptive.rls.RLS(length, lmbd=0.999, delta=10, dtype=<MagicMock id='140189071224080'>)¶
Bases:
AdaptiveFilter
Implementation of the exponentially weighted Recursive Least Squares (RLS) adaptive filter algorithm.
- Parameters
length (int) – the length of the filter
lmbd (float, optional) – the exponential forgetting factor (default 0.999)
delta (float, optional) – the regularization term (default 10)
dtype (numpy type) – the bit depth of the numpy arrays to use (default np.float32)
- reset()¶
Reset the state of the adaptive filter
- update(x_n, d_n)¶
Updates the adaptive filter with a new sample
- Parameters
x_n (float) – the new input sample
d_n (float) – the new noisy reference signal
Subband LMS¶
- class pyroomacoustics.adaptive.subband_lms.SubbandLMS(num_taps, num_bands, mu=0.5, nlms=True)¶
Bases:
object
Frequency domain implementation of LMS. Adaptive filter for each subband.
- Parameters
num_taps (int) – length of the filter
num_bands (int) – number of frequency bands, i.e. number of filters
mu (float, optional) – step size for each subband (default 0.5)
nlms (bool, optional) – whether or not to normalize as in NLMS (default is True)
- reset()¶
- update(X_n, D_n)¶
Updates the adaptive filters for each subband with the new block of input data.
- Parameters
X_n (numpy array, float) – new input signal (to unknown system) in frequency domain
D_n (numpy array, float) – new noisy reference signal in frequency domain
- pyroomacoustics.adaptive.subband_lms.hermitian(X)¶
Compute and return Hermitian transpose
Tools and Helpers¶
Data Structures¶
- class pyroomacoustics.adaptive.data_structures.Buffer(length=20, dtype=<MagicMock id='140189071707792'>)¶
Bases:
object
A simple buffer class with amortized cost
- Parameters
length (int) – buffer length
dtype (numpy.type) – data type
- flush(n)¶
Removes the n oldest elements in the buffer
- push(val)¶
Add one element at the front of the buffer
- size()¶
Returns the number of elements in the buffer
- top(n)¶
Returns the n elements at the front of the buffer from newest to oldest
- class pyroomacoustics.adaptive.data_structures.CoinFlipper(p, length=10000)¶
Bases:
object
This class efficiently generates large number of coin flips. Because each call to
numpy.random.rand
is a little bit costly, it is more efficient to generate many values at once. This class does this and stores them in advance. It generates new fresh numbers when needed.- Parameters
p (float, 0 < p < 1) – probability to output a 1
length (int) – the number of flips to precompute
- flip(n)¶
Get n random binary values from the buffer
- flip_all()¶
Regenerates all the used up values
- fresh_flips(n)¶
Generates n binary random values now
- class pyroomacoustics.adaptive.data_structures.Powers(a, length=20, dtype=<MagicMock id='140189071204816'>)¶
Bases:
object
This class allows to store all powers of a small number and get them ‘a la numpy’ with the bracket operator. There is automatic increase when new values are requested
- Parameters
a (float) – the number
length (int) – the number of integer powers
dtype (numpy.type, optional) – the data type (typically np.float32 or np.float64)
Example
>>> an = Powers(0.5) >>> print(an[4]) 0.0625
Utilities¶
- pyroomacoustics.adaptive.util.autocorr(x)¶
Fast autocorrelation computation using the FFT
- pyroomacoustics.adaptive.util.hankel_multiplication(c, r, A, mkl=True, **kwargs)¶
Compute numpy.dot(scipy.linalg.hankel(c,r=r), A) using the FFT.
- Parameters
c (ndarray) – the first column of the Hankel matrix
r (ndarray) – the last row of the Hankel matrix
A (ndarray) – the matrix to multiply on the right
mkl (bool, optional) – if True, use the mkl_fft package if available
- pyroomacoustics.adaptive.util.hankel_stride_trick(x, shape)¶
Make a Hankel matrix from a vector using stride tricks
- Parameters
x (ndarray) – a vector that contains the concatenation of the first column and first row of the Hankel matrix to build without repetition of the lower left corner value of the matrix
shape (tuple) – the shape of the Hankel matrix to build, it must satisfy
x.shape[0] == shape[0] + shape[1] - 1
- pyroomacoustics.adaptive.util.mkl_toeplitz_multiplication(c, r, A, A_padded=False, out=None, fft_len=None)¶
Compute numpy.dot(scipy.linalg.toeplitz(c,r), A) using the FFT from the mkl_fft package.
- Parameters
c (ndarray) – the first column of the Toeplitz matrix
r (ndarray) – the first row of the Toeplitz matrix
A (ndarray) – the matrix to multiply on the right
A_padded (bool, optional) – the A matrix can be pre-padded with zeros by the user, if this is the case set to True
out (ndarray, optional) – an ndarray to store the output of the multiplication
fft_len (int, optional) – specify the length of the FFT to use
- pyroomacoustics.adaptive.util.naive_toeplitz_multiplication(c, r, A)¶
Compute numpy.dot(scipy.linalg.toeplitz(c,r), A)
- Parameters
c (ndarray) – the first column of the Toeplitz matrix
r (ndarray) – the first row of the Toeplitz matrix
A (ndarray) – the matrix to multiply on the right
- pyroomacoustics.adaptive.util.toeplitz_multiplication(c, r, A, **kwargs)¶
Compute numpy.dot(scipy.linalg.toeplitz(c,r), A) using the FFT.
- Parameters
c (ndarray) – the first column of the Toeplitz matrix
r (ndarray) – the first row of the Toeplitz matrix
A (ndarray) – the matrix to multiply on the right
- pyroomacoustics.adaptive.util.toeplitz_opt_circ_approx(r, matrix=False)¶
Optimal circulant approximation of a symmetric Toeplitz matrix by Tony F. Chan
- Parameters
r (ndarray) – the first row of the symmetric Toeplitz matrix
matrix (bool, optional) – if True, the full symetric Toeplitz matrix is returned, otherwise, only the first column
- pyroomacoustics.adaptive.util.toeplitz_strang_circ_approx(r, matrix=False)¶
Circulant approximation to a symetric Toeplitz matrix by Gil Strang
- Parameters
r (ndarray) – the first row of the symmetric Toeplitz matrix
matrix (bool, optional) – if True, the full symetric Toeplitz matrix is returned, otherwise, only the first column
Blind Source Separation¶
Module contents¶
Blind Source Separation¶
Implementations of a few blind source separation (BSS) algorithms.
- AuxIVA
- Independent Vector Analysis 1
- Trinicon
- Time-domain BSS 2
- ILRMA
- Independent Low-Rank Matrix Analysis 3
- SparseAuxIVA
- Sparse Independent Vector Analysis 4
- FastMNMF
- Fast Multichannel Nonnegative Matrix Factorization 5
- FastMNMF2
- Fast Multichannel Nonnegative Matrix Factorization 2 6
A few commonly used functions, such as projection back, can be found in
pyroomacoustics.bss.common
.
References
- 1
N. Ono, Stable and fast update rules for independent vector analysis based on auxiliary function technique, Proc. IEEE, WASPAA, pp. 189-192, Oct. 2011.
- 2
R. Aichner, H. Buchner, F. Yan, and W. Kellermann A real-time blind source separation scheme and its application to reverberant and noisy acoustic environments, Signal Processing, 86(6), 1260-1277. doi:10.1016/j.sigpro.2005.06.022, 2006.
- 3
D. Kitamura, N. Ono, H. Sawada, H. Kameoka, H. Saruwatari, Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization, IEEE/ACM Trans. ASLP, vol. 24, no. 9, pp. 1626-1641, Sept. 2016
- 4
J. Janský, Z. Koldovský, and N. Ono, A computationally cheaper method for blind speech separation based on AuxIVA and incomplete demixing transform, Proc. IEEE, IWAENC, pp. 1-5, Sept. 2016.
- 5
K. Sekiguchi, A. A. Nugraha, Y. Bando, K. Yoshii, Fast Multichannel Source Separation Based on Jointly Diagonalizable Spatial Covariance Matrices, EUSIPCO, 2019.
- 6
K. Sekiguchi, Y. Bando, A. A. Nugraha, K. Yoshii, T. Kawahara, Fast Multichannel Nonnegative Matrix Factorization with Directivity-Aware Jointly-Diagonalizable Spatial Covariance Matrices for Blind Source Separation, IEEE/ACM Trans. ASLP, vol. 28, pp. 2610-2625, 2020.
Algorithms¶
Independent Vector Analysis (AuxIVA)¶
AuxIVA¶
Blind Source Separation using independent vector analysis based on auxiliary function. This function will separate the input signal into statistically independent sources without using any prior information.
The algorithm in the determined case, i.e., when the number of sources is equal to the number of microphones, is AuxIVA 1. When there are more microphones (the overdetermined case), a computationaly cheaper variant (OverIVA) is used 2.
Example
from scipy.io import wavfile
import pyroomacoustics as pra
# read multichannel wav file
# audio.shape == (nsamples, nchannels)
fs, audio = wavfile.read("my_multichannel_audio.wav")
# STFT analysis parameters
fft_size = 4096 # `fft_size / fs` should be ~RT60
hop == fft_size // 2 # half-overlap
win_a = pra.hann(fft_size) # analysis window
# optimal synthesis window
win_s = pra.transform.compute_synthesis_window(win_a, hop)
# STFT
# X.shape == (nframes, nfrequencies, nchannels)
X = pra.transform.analysis(audio, fft_size, hop, win=win_a)
# Separation
Y = pra.bss.auxiva(X, n_iter=20)
# iSTFT (introduces an offset of `hop` samples)
# y contains the time domain separated signals
# y.shape == (new_nsamples, nchannels)
y = pra.transform.synthesis(Y, fft_size, hop, win=win_s)
References
- 1
N. Ono, Stable and fast update rules for independent vector analysis based on auxiliary function technique, Proc. IEEE, WASPAA, pp. 189-192, Oct. 2011.
- 2
R. Scheibler and N. Ono, Independent Vector Analysis with more Microphones than Sources, arXiv, 2019. https://arxiv.org/abs/1905.07880
- pyroomacoustics.bss.auxiva.auxiva(X, n_src=None, n_iter=20, proj_back=True, W0=None, model='laplace', init_eig=False, return_filters=False, callback=None)¶
This is an implementation of AuxIVA/OverIVA that separates the input signal into statistically independent sources. The separation is done in the time-frequency domain and the FFT length should be approximately equal to the reverberation time.
Two different statistical models (Laplace or time-varying Gauss) can be used by using the keyword argument model. The performance of Gauss model is higher in good conditions (few sources, low noise), but Laplace (the default) is more robust in general.
- Parameters
X (ndarray (nframes, nfrequencies, nchannels)) – STFT representation of the signal
n_src (int, optional) – The number of sources or independent components. When
n_src==nchannels
, the algorithms is identical to AuxIVA. Whenn_src==1
, then it is doing independent vector extraction.n_iter (int, optional) – The number of iterations (default 20)
proj_back (bool, optional) – Scaling on first mic by back projection (default True)
W0 (ndarray (nfrequencies, nsrc, nchannels), optional) – Initial value for demixing matrix
model (str) – The model of source distribution ‘gauss’ or ‘laplace’ (default)
init_eig (bool, optional (default
False
)) – IfTrue
, and ifW0 is None
, then the weights are initialized using the principal eigenvectors of the covariance matrix of the input data. WhenFalse
, the demixing matrices are initialized with identity matrix.return_filters (bool) – If true, the function will return the demixing matrix too
callback (func) – A callback function called every 10 iterations, allows to monitor convergence
- Returns
Returns an (nframes, nfrequencies, nsources) array. Also returns
the demixing matrix (nfrequencies, nchannels, nsources)
if
return_values
keyword is True.
Trinicon¶
- pyroomacoustics.bss.trinicon.trinicon(signals, w0=None, filter_length=2048, block_length=None, n_blocks=8, alpha_on=None, j_max=10, delta_max=0.0001, sigma2_0=1e-07, mu=0.001, lambd_a=0.2, return_filters=False)¶
Implementation of the TRINICON Blind Source Separation algorithm as described in
R. Aichner, H. Buchner, F. Yan, and W. Kellermann A real-time blind source separation scheme and its application to reverberant and noisy acoustic environments, Signal Processing, 86(6), 1260-1277. doi:10.1016/j.sigpro.2005.06.022, 2006. [pdf]
Specifically, adaptation of the pseudo-code from Table 1.
The implementation is hard-coded for 2 output channels.
- Parameters
signals (ndarray (nchannels, nsamples)) – The microphone input signals (time domain)
w0 (ndarray (nchannels, nsources, nsamples), optional) – Optional initial value for the demixing filters
filter_length (int, optional) – The length of the demixing filters, if w0 is provided, this option is ignored
block_length (int, optional) – Block length (default 2x filter_length)
n_blocks (int, optional) – Number of blocks processed at once (default 8)
alpha_on (int, optional) – Online overlap factor (default
n_blocks
)j_max (int, optional) – Number of offline iterations (default 10)
delta_max (float, optional) – Regularization parameter, this sets the maximum value of the regularization term (default 1e-4)
sigma2_0 (float, optional) – Regularization parameter, this sets the reference (machine?) noise level in the regularization (default 1e-7)
mu (float, optional) – Offline update step size (default 0.001)
lambd_a (float, optional) – Online forgetting factor (default 0.2)
return_filters (bool) – If true, the function will return the demixing matrix too (default False)
- Returns
Returns an (nsources, nsamples) array. Also returns the demixing matrix (nchannels, nsources, nsamples) if
return_filters
keyword is True.- Return type
ndarray
Independent Low-Rank Matrix Analysis (ILRMA)¶
ILRMA¶
Blind Source Separation using Independent Low-Rank Matrix Analysis (ILRMA).
- pyroomacoustics.bss.ilrma.ilrma(X, n_src=None, n_iter=20, proj_back=True, W0=None, n_components=2, return_filters=False, callback=None)¶
Implementation of ILRMA algorithm without partitioning function for BSS presented in
D. Kitamura, N. Ono, H. Sawada, H. Kameoka, H. Saruwatari, Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization, IEEE/ACM Trans. ASLP, vol. 24, no. 9, pp. 1626-1641, Sept. 2016
D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari Determined Blind Source Separation with Independent Low-Rank Matrix Analysis, in Audio Source Separation, S. Makino, Ed. Springer, 2018, pp. 125-156.
- Parameters
X (ndarray (nframes, nfrequencies, nchannels)) – STFT representation of the observed signal n_src: int, optional The number of sources or independent components
n_iter (int, optional) – The number of iterations (default 20)
proj_back (bool, optional) – Scaling on first mic by back projection (default True)
W0 (ndarray (nfrequencies, nchannels, nchannels), optional) – Initial value for demixing matrix
n_components (int) – Number of components in the non-negative spectrum
return_filters (bool) – If true, the function will return the demixing matrix too
callback (func) – A callback function called every 10 iterations, allows to monitor convergence
- Returns
Returns an (nframes, nfrequencies, nsources) array. Also returns
the demixing matrix W (nfrequencies, nchannels, nsources)
if
return_filters
keyword is True.
Sparse Independent Vector Analysis (SparseAuxIVA)¶
- pyroomacoustics.bss.sparseauxiva.sparseauxiva(X, S=None, n_src=None, n_iter=20, proj_back=True, W0=None, model='laplace', return_filters=False, callback=None)¶
Implementation of sparse AuxIVA algorithm for BSS presented in
J. Janský, Z. Koldovský, and N. Ono, A computationally cheaper method for blind speech separation based on AuxIVA and incomplete demixing transform, Proc. IEEE, IWAENC, pp. 1-5, Sept. 2016.
- Parameters
X (ndarray (nframes, nfrequencies, nchannels)) – STFT representation of the signal
n_src (int, optional) – The number of sources or independent components
S (ndarray (k_freq)) – Indexes of active frequency bins for sparse AuxIVA
n_iter (int, optional) – The number of iterations (default 20)
proj_back (bool, optional) – Scaling on first mic by back projection (default True)
W0 (ndarray (nfrequencies, nchannels, nchannels), optional) – Initial value for demixing matrix
model (str) – The model of source distribution ‘gauss’ or ‘laplace’ (default)
return_filters (bool) – If true, the function will return the demixing matrix too
callback (func) – A callback function called every 10 iterations, allows to monitor convergence
- Returns
Returns an (nframes, nfrequencies, nsources) array. Also returns
the demixing matrix (nfrequencies, nchannels, nsources)
if
return_values
keyword is True.
Common Tools¶
- pyroomacoustics.bss.common.projection_back(Y, ref, clip_up=None, clip_down=None)¶
This function computes the frequency-domain filter that minimizes the squared error to a reference signal. This is commonly used to solve the scale ambiguity in BSS.
Here is the derivation of the projection. The optimal filter z minimizes the squared error.
\[\min E[|z^* y - x|^2]\]It should thus satsify the orthogonality condition and can be derived as follows
\[ \begin{align}\begin{aligned}\begin{split}0 & = E[y^*\\, (z^* y - x)]\end{split}\\\begin{split}0 & = z^*\\, E[|y|^2] - E[y^* x]\end{split}\\\begin{split}z^* & = \\frac{E[y^* x]}{E[|y|^2]}\end{split}\\\begin{split}z & = \\frac{E[y x^*]}{E[|y|^2]}\end{split}\end{aligned}\end{align} \]In practice, the expectations are replaced by the sample mean.
- Parameters
Y (array_like (n_frames, n_bins, n_channels)) – The STFT data to project back on the reference signal
ref (array_like (n_frames, n_bins)) – The reference signal
clip_up (float, optional) – Limits the maximum value of the gain (default no limit)
clip_down (float, optional) – Limits the minimum value of the gain (default no limit)
- pyroomacoustics.bss.common.sparir(G, S, weights=<MagicMock name='mock()' id='140189071726160'>, gini=0, maxiter=50, tol=10, alpha=10, alphamax=100000.0, alphamin=1e-07)¶
Fast proximal algorithm implementation for sparse approximation of relative impulse responses from incomplete measurements of the corresponding relative transfer function based on
Z. Koldovsky, J. Malek, and S. Gannot, “Spatial Source Subtraction based on Incomplete Measurements of Relative Transfer Function”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, TASLP 2015.
The original Matlab implementation can be found at http://itakura.ite.tul.cz/zbynek/dwnld/SpaRIR.m
and it is referred in
Z. Koldovsky, F. Nesta, P. Tichavsky, and N. Ono, Frequency-domain blind speech separation using incomplete de-mixing transform, EUSIPCO 2016.
- Parameters
G (ndarray (nfrequencies, 1)) – Frequency representation of the (incomplete) transfer function
S (ndarray (kfrequencies)) – Indexes of active frequency bins for sparse AuxIVA
weights (ndarray (kfrequencies) or int, optional) – The higher the value of weights(i), the higher the probability that g(i) is zero; if scalar, all weights are the same; if empty, default value is used
gini (ndarray (nfrequencies)) – Initialization for the computation of g
maxiter (int) – Maximum number of iterations before achieving convergence (default 50)
tol (float) – Minimum convergence criteria based on the gradient difference between adjacent updates (default 10)
alpha (float) – Inverse of the decreasing speed of the gradient at each iteration. This parameter is updated at every iteration (default 10)
alphamax (float) – Upper bound for alpha (default 1e5)
alphamin (float) – Lower bound for alpha (default 1e-7)
- Returns
Returns the sparse approximation of the impulse response in the
time-domain (real-valued) as an (nfrequencies) array.
Fast Multichannel Nonnegative Matrix Factorization (FastMNMF)¶
FastMNMF¶
Blind Source Separation using Fast Multichannel Nonnegative Matrix Factorization (FastMNMF)
- pyroomacoustics.bss.fastmnmf.fastmnmf(X, n_src=None, n_iter=30, n_components=8, mic_index=0, W0=None, accelerate=True, callback=None)¶
Implementation of FastMNMF algorithm presented in
K. Sekiguchi, A. A. Nugraha, Y. Bando, K. Yoshii, Fast Multichannel Source Separation Based on Jointly Diagonalizable Spatial Covariance Matrices, EUSIPCO, 2019. [arXiv] [IEEE]
The code of FastMNMF with GPU support and FastMNMF-DP which integrates DNN-based source model into FastMNMF is available on https://github.com/sekiguchi92/SoundSourceSeparation
- Parameters
X (ndarray (nframes, nfrequencies, nchannels)) – STFT representation of the observed signal
n_src (int, optional) – The number of sound sources (default None). If None, n_src is set to the number of microphones
n_iter (int, optional) – The number of iterations (default 30)
n_components (int, optional) – Number of components in the non-negative spectrum (default 8)
mic_index (int or 'all', optional) – The index of microphone of which you want to get the source image (default 0). If ‘all’, return the source images of all microphones
W0 (ndarray (nfrequencies, nchannels, nchannels), optional) – Initial value for diagonalizer Q (default None). If None, identity matrices are used for all frequency bins.
accelerate (bool, optional) – If true, the basis and activation of NMF are updated simultaneously (default True)
callback (func, optional) – A callback function called every 10 iterations, allows to monitor convergence
- Returns
If mic_index is int, returns an (nframes, nfrequencies, nsources) array.
If mic_index is ‘all’, returns an (nchannels, nframes, nfrequencies, nsources) array.
Direction of Arrival¶
Module contents¶
Direction of Arrival Finding¶
This sub-package provides implementations of popular direction of arrival findings algorithms.
- MUSIC
- Multiple Signal Classification 1
- NormMUSIC
- MUSIC with frequency normalization 2
- SRP-PHAT
- Steered Response Power – Phase Transform 3
- CSSM
- Coherent Signal Subspace Method 4
- WAVES
- Weighted Average of Signal Subspaces 5
- TOPS
- Test of Orthogonality of Projected Subspaces 6
- FRIDA
- Finite Rate of Innovation Direction of Arrival 7
All these classes derive from the abstract base class
pyroomacoustics.doa.doa.DOA
that offers generic methods for finding
and visualizing the locations of acoustic sources.
The constructor can be called once to build the DOA finding object. Then, the
method pyroomacoustics.doa.doa.DOA.locate_sources
performs DOA
finding based on time-frequency passed to it as an argument. Extra arguments
can be supplied to indicate which frequency bands should be used for
localization.
How to use the DOA module¶
Here R
is a 2xQ ndarray that contains the locations of the Q microphones
in the columns, fs
is the sampling frequency of the input signal, and
nfft
the length of the FFT used.
The STFT snapshots are passed to the localization methods in the X ndarray of
shape Q x (nfft // 2 + 1) x n_snapshots
, where n_snapshots
is the
number of STFT frames to use for the localization. The option freq_bins
can be provided to specify which frequency bins to use for the localization.
>>> doa = pyroomacoustics.doa.MUSIC(R, fs, nfft)
>>> doa.locate_sources(X, freq_bins=np.arange(20, 40))
Other Available Subpackages¶
pyroomacoustics.doa.grid
this provides abstractions for computing functions on regular or irregular grids defined on circles and spheres with peak finding methods
pyroomacoustics.doa.plotters
a few methods to plot functions and points on circles or spheres
pyroomacoustics.doa.detect_peaks
1D peak detection routine from Marcos Duarte
pyroomacoustics.doa.tools_frid_doa_plane
routines implementing FRIDA algorithm
Utilities¶
pyroomacoustics.doa.algorithms
a dictionary containing all the DOA object subclasses availables indexed by keys
['MUSIC', 'NormMUSIC', 'SRP', 'CSSM', 'WAVES', 'TOPS', 'FRIDA']
Note on MUSIC¶
Since NormMUSIC has a more robust performance, we recommend to use NormMUSIC over MUSIC. When MUSIC is used as a baseline for publications, we recommend to use both NormMUSIC and MUSIC. For more information, you may have a look at our jupyter notebook at https://github.com/LCAV/pyroomacoustics/tree/master/notebooks/norm_music_demo.ipynb
References
- 1
R. Schmidt, Multiple emitter location and signal parameter estimation, IEEE Trans. Antennas Propag., Vol. 34, Num. 3, pp 276–280, 1986
- 2
D. Salvati, C. Drioli, G. L. Foresti, Incoherent Frequency Fusion for Broadband Steered Response Power Algorithms in Noisy Environments, IEEE Signal Process. Lett., Vol. 21, Num. 5, pp 581-585, 2014
- 3
J. H. DiBiase, A high-accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays, PHD Thesis, Brown University, 2000
- 4
H. Wang, M. Kaveh, Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources, IEEE Trans. Acoust., Speech, Signal Process., Vol. 33, Num. 4, pp 823–831, 1985
- 5
E. D. di Claudio, R. Parisi, WAVES: Weighted average of signal subspaces for robust wideband direction finding, IEEE Trans. Signal Process., Vol. 49, Num. 10, 2179–2191, 2001
- 6
Y. Yeo-Sun, L. M. Kaplan, J. H. McClellan, TOPS: New DOA estimator for wideband signals, IEEE Trans. Signal Process., Vol. 54, Num 6., pp 1977–1989, 2006
- 7
H. Pan, R. Scheibler, E. Bezzam, I. Dokmanic, and M. Vetterli, FRIDA: FRI-based DOA estimation for arbitrary array layouts, Proc. ICASSP, pp 3186-3190, 2017
Algorithms¶
CSSM¶
- class pyroomacoustics.doa.cssm.CSSM(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, num_iter=5, **kwargs)¶
Bases:
MUSIC
Class to apply the Coherent Signal-Subspace method [CSSM] for Direction of Arrival (DoA) estimation.
Note
Run locate_sources() to apply the CSSM algorithm.
- Parameters
L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
fs (float) – Sampling frequency.
nfft (int) – FFT length.
c (float) – Speed of sound. Default: 343 m/s
num_src (int) – Number of sources to detect. Default: 1
mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
colatitude (numpy array) – Candidate colatitude angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
num_iter (int) – Number of iterations for CSSM. Default: 5
References
- CSSM
H. Wang, M. Kaveh, Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources, IEEE Trans. Acoust., Speech, Signal Process., Vol. 33, Num. 4, pp 823–831, 1985
FRIDA¶
- class pyroomacoustics.doa.frida.FRIDA(L, fs, nfft, max_four=None, c=343.0, num_src=1, G_iter=None, max_ini=5, n_rot=1, max_iter=50, noise_level=1e-10, low_rank_cleaning=False, stopping='max_iter', stft_noise_floor=0.0, stft_noise_margin=1.5, signal_type='visibility', use_lu=True, verbose=False, symb=True, use_cache=False, **kwargs)¶
Bases:
DOA
Implements the FRI-based direction of arrival finding algorithm [FRIDA].
Note
Run locate_sources() to apply the CSSM algorithm.
- Parameters
L (ndarray) – Contains the locations of the microphones in the columns
fs (int or float) – Sampling frequency
nfft (int) – FFT size
max_four (int) – Maximum order of the Fourier or spherical harmonics expansion
c (float, optional) – Speed of sound
num_src (int, optional) – The number of sources to recover (default 1)
G_iter (int) – Number of mapping matrix refinement iterations in recovery algorithm (default 1)
max_ini (int, optional) – Number of random initializations to use in recovery algorithm (default 5)
n_rot (int, optional) – Number of random rotations to apply before recovery algorithm (default 10)
noise_level (float, optional) – Noise level in the visibility measurements, if available (default 1e-10)
stopping (str, optional) – Stopping criteria for the recovery algorithm. Can be max iterations or noise level (default max_iter)
stft_noise_floor (float) – The noise floor in the STFT measurements, if available (default 0)
stft_noise_margin (float) – When this, along with stft_noise_floor is set, we only pick frames with at least stft_noise_floor * stft_noise_margin power
signal_type (str) –
Which type of measurements to use:
’visibility’: Cross correlation measurements
’raw’: Microphone signals
use_lu (bool, optional) – Whether to use LU decomposition for efficiency
verbose (bool, optional) – Whether to output intermediate result for debugging purposes
symb (bool, optional) – Whether enforce the symmetry on the reconstructed uniform samples of sinusoids b
References
- FRIDA
H. Pan, R. Scheibler, E. Bezzam, I. Dokmanic, and M. Vetterli, FRIDA: FRI-based DOA estimation for arbitrary array layouts, Proc. ICASSP, pp 3186-3190, 2017
MUSIC¶
- class pyroomacoustics.doa.music.MUSIC(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, frequency_normalization=False, **kwargs)¶
Bases:
DOA
Class to apply MUltiple SIgnal Classication (MUSIC) direction-of-arrival (DoA) for a particular microphone array.
Note
Run locate_source() to apply the MUSIC algorithm.
- Parameters
L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
fs (float) – Sampling frequency.
nfft (int) – FFT length.
c (float) – Speed of sound. Default: 343 m/s
num_src (int) – Number of sources to detect. Default: 1
mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
frequency_normalization (bool) – If True, the MUSIC pseudo-spectra are normalized before averaging across the frequency axis, default:False
- plot_individual_spectrum()¶
Plot the steered response for each frequency.
SRP-PHAT¶
- class pyroomacoustics.doa.srp.SRP(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, **kwargs)¶
Bases:
DOA
Class to apply Steered Response Power (SRP) direction-of-arrival (DoA) for a particular microphone array.
Note
Run locate_source() to apply the SRP-PHAT algorithm.
- Parameters
L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
fs (float) – Sampling frequency.
nfft (int) – FFT length.
c (float) – Speed of sound. Default: 343 m/s
num_src (int) – Number of sources to detect. Default: 1
mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
TOPS¶
- class pyroomacoustics.doa.tops.TOPS(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, **kwargs)¶
Bases:
MUSIC
Class to apply Test of Orthogonality of Projected Subspaces [TOPS] for Direction of Arrival (DoA) estimation.
Note
Run locate_source() to apply the TOPS algorithm.
- Parameters
L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
fs (float) – Sampling frequency.
nfft (int) – FFT length.
c (float) – Speed of sound. Default: 343 m/s
num_src (int) – Number of sources to detect. Default: 1
mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
References
- TOPS
Y. Yeo-Sun, L. M. Kaplan, J. H. McClellan, TOPS: New DOA estimator for wideband signals, IEEE Trans. Signal Process., Vol. 54, Num 6., pp 1977–1989, 2006
WAVES¶
- class pyroomacoustics.doa.waves.WAVES(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, num_iter=5, **kwargs)¶
Bases:
MUSIC
Class to apply Weighted Average of Signal Subspaces [WAVES] for Direction of Arrival (DoA) estimation.
Note
Run locate_sources() to apply the WAVES algorithm.
- Parameters
L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
fs (float) – Sampling frequency.
nfft (int) – FFT length.
c (float) – Speed of sound. Default: 343 m/s
num_src (int) – Number of sources to detect. Default: 1
mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
num_iter (int) – Number of iterations for CSSM. Default: 5
References
- WAVES
E. D. di Claudio, R. Parisi, WAVES: Weighted average of signal subspaces for robust wideband direction finding, IEEE Trans. Signal Process., Vol. 49, Num. 10, 2179–2191, 2001
Tools and Helpers¶
DOA (Base)¶
- class pyroomacoustics.doa.doa.DOA(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, n_grid=None, dim=2, *args, **kwargs)¶
Bases:
object
Abstract parent class for Direction of Arrival (DoA) algorithms. After creating an object (SRP, MUSIC, CSSM, WAVES, or TOPS), run locate_source to apply the corresponding algorithm.
- Parameters
L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.
fs (float) – Sampling frequency.
nfft (int) – FFT length.
c (float) – Speed of sound. Default: 343 m/s
num_src (int) – Number of sources to detect. Default: 1
mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’
r (numpy array) – Candidate distances from the origin. Default: np.ones(1)
azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180
colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)
n_grid (int) – If azimuth and colatitude are not specified, we will create a grid with so many points. Default is 360.
dim (int) – The dimension of the problem. Set dim=2 to find sources on the circle (x-y plane). Set dim=3 to search on the whole sphere.
- locate_sources(X, num_src=None, freq_range=[500.0, 4000.0], freq_bins=None, freq_hz=None)¶
Locate source(s) using corresponding algorithm.
- Parameters
X (numpy array) – Set of signals in the frequency (RFFT) domain for current frame. Size should be M x F x S, where M should correspond to the number of microphones, F to nfft/2+1, and S to the number of snapshots (user-defined). It is recommended to have S >> M.
num_src (int) – Number of sources to detect. Default is value given to object constructor.
freq_range (list of floats, length 2) – Frequency range on which to run DoA: [fmin, fmax].
freq_bins (list of int) – freq_bins: List of individual frequency bins on which to run DoA. If defined by user, it will not take into consideration freq_range or freq_hz.
freq_hz (list of floats) – List of individual frequencies on which to run DoA. If defined by user, it will not take into consideration freq_range.
- polar_plt_dirac(azimuth_ref=None, alpha_ref=None, save_fig=False, file_name=None, plt_dirty_img=True)¶
Generate polar plot of DoA results.
- Parameters
azimuth_ref (numpy array) – True direction of sources (in radians).
alpha_ref (numpy array) – Estimated amplitude of sources.
save_fig (bool) – Whether or not to save figure as pdf.
file_name (str) – Name of file (if saved). Default is ‘polar_recon_dirac.pdf’
plt_dirty_img (bool) – Whether or not to plot spatial spectrum or ‘dirty image’ in the case of FRI.
- class pyroomacoustics.doa.doa.ModeVector(L, fs, nfft, c, grid, mode='far', precompute=False)¶
Bases:
object
This is a class for look-up tables of mode vectors. This look-up table is an outer product of three vectors running along candidate locations, time, and frequency. When the grid becomes large, the look-up table might be too large to store in memory. In that case, this class allows to only compute the outer product elements when needed, only keeping the three vectors in memory. When the table is small, a precompute option can be set to True to compute the whole table in advance.
Tools for FRIDA (azimuth only)¶
- pyroomacoustics.doa.tools_fri_doa_plane.Rmtx_ri(coef_ri, K, D, L)¶
Split T matrix in rea/imaginary representation
- pyroomacoustics.doa.tools_fri_doa_plane.Rmtx_ri_half(coef_half, K, D, L, D_coef)¶
Split T matrix in rea/imaginary conjugate symmetric representation
- pyroomacoustics.doa.tools_fri_doa_plane.Rmtx_ri_half_out_half(coef_half, K, D, L, D_coef, mtx_shrink)¶
if both b and annihilation filter coefficients are Hermitian symmetric, then the output will also be Hermitian symmetric => the effectively output is half the size
- pyroomacoustics.doa.tools_fri_doa_plane.Tmtx_ri(b_ri, K, D, L)¶
build convolution matrix associated with b_ri
- Parameters
b_ri – a real-valued vector
K – number of Diracs
D1 – expansion matrix for the real-part
D2 – expansion matrix for the imaginary-part
- pyroomacoustics.doa.tools_fri_doa_plane.Tmtx_ri_half(b_ri, K, D, L, D_coef)¶
Split T matrix in conjugate symmetric representation
- pyroomacoustics.doa.tools_fri_doa_plane.Tmtx_ri_half_out_half(b_ri, K, D, L, D_coef, mtx_shrink)¶
if both b and annihilation filter coefficients are Hermitian symmetric, then the output will also be Hermitian symmetric => the effectively output is half the size
- pyroomacoustics.doa.tools_fri_doa_plane.build_mtx_amp(phi_k, p_mic_x, p_mic_y)¶
the matrix that maps Diracs’ amplitudes to the visibility
- Parameters
phi_k – Diracs’ location (azimuth)
p_mic_x – a vector that contains microphones’ x-coordinates
p_mic_y – a vector that contains microphones’ y-coordinates
- pyroomacoustics.doa.tools_fri_doa_plane.build_mtx_amp_ri(p_mic_x, p_mic_y, phi_k)¶
builds real/imaginary amplitude matrix
- pyroomacoustics.doa.tools_fri_doa_plane.build_mtx_raw_amp(p_mic_x, p_mic_y, phi_k)¶
the matrix that maps Diracs’ amplitudes to the visibility
- Parameters
phi_k – Diracs’ location (azimuth)
p_mic_x – a vector that contains microphones’ x-coordinates
p_mic_y – a vector that contains microphones’ y-coordinates
- pyroomacoustics.doa.tools_fri_doa_plane.coef_expan_mtx(K)¶
expansion matrix for an annihilating filter of size K + 1
- Parameters
K – number of Dirac. The filter size is K + 1
- pyroomacoustics.doa.tools_fri_doa_plane.compute_b(G_lst, GtG_lst, beta_lst, Rc0, num_bands, a_ri, use_lu=False, GtG_inv_lst=None)¶
compute the uniform sinusoidal samples b from the updated annihilating filter coeffiients.
- Parameters
GtG_lst – list of G^H G for different subbands
beta_lst – list of beta-s for different subbands
Rc0 – right-dual matrix, here it is the convolution matrix associated with c
num_bands – number of bands
a_ri – a 2D numpy array. each column corresponds to the measurements within a subband
- pyroomacoustics.doa.tools_fri_doa_plane.compute_mtx_obj(GtG_lst, Tbeta_lst, Rc0, num_bands, K)¶
- compute the matrix (M) in the objective function:
min c^H M c s.t. c0^H c = 1
- Parameters
GtG_lst – list of G^H * G
Tbeta_lst – list of Teoplitz matrices for beta-s
Rc0 – right dual matrix for the annihilating filter (same of each block -> not a list)
- pyroomacoustics.doa.tools_fri_doa_plane.compute_obj_val(GtG_inv_lst, Tbeta_lst, Rc0, c_ri_half, num_bands, K)¶
compute the fitting error. CAUTION: Here we assume use_lu = True
- pyroomacoustics.doa.tools_fri_doa_plane.cov_mtx_est(y_mic)¶
estimate covariance matrix
- Parameters
y_mic – received signal (complex based band representation) at microphones
- pyroomacoustics.doa.tools_fri_doa_plane.cpx_mtx2real(mtx)¶
extend complex valued matrix to an extended matrix of real values only
- Parameters
mtx – input complex valued matrix
- pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri(G, a_ri, K, M, noise_level, max_ini=100, stop_cri='mse')¶
Reconstruct point sources’ locations (azimuth) from the visibility measurements
- Parameters
G – the linear transformation matrix that links the visibilities to uniformly sampled sinusoids
a_ri – the visibility measurements
K – number of Diracs
M – the Fourier series expansion is between -M and M
noise_level – level of noise (ell_2 norm) in the measurements
max_ini – maximum number of initialisations
stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
- pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri_half(G, a_ri, K, M, noise_level, max_ini=100, stop_cri='mse')¶
Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged.
- Parameters
G (param) – the linear transformation matrix that links the visibilities to uniformly sampled sinusoids
a_ri – the visibility measurements
K – number of Diracs
M – the Fourier series expansion is between -M and M
noise_level – level of noise (ell_2 norm) in the measurements
max_ini – maximum number of initialisations
stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
- pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri_half_multiband(G_lst, a_ri, K, M, max_ini=100)¶
Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged.
- Parameters
G_lst – a list of the linear transformation matrices that links the visibilities to uniformly sampled sinusoids
a_ri – the visibility measurements
K – number of Diracs
M – the Fourier series expansion is between -M and M
noise_level – level of noise (ell_2 norm) in the measurements
max_ini – maximum number of initialisations
stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
- pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri_half_multiband_lu(G_lst, GtG_lst, GtG_inv_lst, a_ri, K, M, max_ini=100, max_iter=50)¶
Here we use LU decomposition to precompute a few entries. Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged.
- Parameters
G_lst – a list of the linear transformation matrices that links the visibilities to uniformly sampled sinusoids
a_ri – the visibility measurements
K – number of Diracs
M – the Fourier series expansion is between -M and M
noise_level – level of noise (ell_2 norm) in the measurements
max_ini – maximum number of initialisations
stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
- pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri_half_multiband_parallel(G, a_ri, K, M, max_ini=100)¶
Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged. We use parallel implementation when stop_cri == ‘max_iter’
- Parameters
G – the linear transformation matrix that links the visibilities to uniformly sampled sinusoids
a_ri – the visibility measurements
K – number of Diracs
M – the Fourier series expansion is between -M and M
noise_level – level of noise (ell_2 norm) in the measurements
max_ini – maximum number of initialisations
stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
- pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri_half_parallel(G, a_ri, K, M, max_ini=100)¶
Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged. We use parallel implementation when stop_cri == ‘max_iter’
- Parameters
G – the linear transformation matrix that links the visibilities to uniformly sampled sinusoids
a_ri – the visibility measurements
K – number of Diracs
M – the Fourier series expansion is between -M and M
noise_level – level of noise (ell_2 norm) in the measurements
max_ini – maximum number of initialisations
stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’
- pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri_inner(c_ri_half, a_ri, rhs, rhs_bl, K, M, D1, D2, D_coef, mtx_shrink, Tbeta_ri, G, max_iter)¶
inner loop of the dirac_recon_ri_half_parallel function
- pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri_multiband_inner(c_ri_half, a_ri, num_bands, rhs, rhs_bl, K, M, D1, D2, D_coef, mtx_shrink, Tbeta_ri, G, GtG, max_iter)¶
Inner loop of the dirac_recon_ri_multiband function
- pyroomacoustics.doa.tools_fri_doa_plane.extract_off_diag(mtx)¶
extract off diagonal entries in mtx. The output vector is order in a column major manner.
- Parameters
mtx – input matrix to extract the off diagonal entries
- pyroomacoustics.doa.tools_fri_doa_plane.hermitian_expan(half_vec_len)¶
expand a real-valued vector to a Hermitian symmetric vector. The input vector is a concatenation of the real parts with NON-POSITIVE indices and the imaginary parts with STRICTLY-NEGATIVE indices.
- Parameters
half_vec_len – length of the first half vector
- pyroomacoustics.doa.tools_fri_doa_plane.lu_compute_mtx_obj(Tbeta_lst, num_bands, K, lu_R_GtGinv_Rt_lst)¶
- compute the matrix (M) in the objective function:
min c^H M c s.t. c0^H c = 1
- Parameters
GtG_lst – list of G^H * G
Tbeta_lst – list of Teoplitz matrices for beta-s
Rc0 – right dual matrix for the annihilating filter (same of each block -> not a list)
- pyroomacoustics.doa.tools_fri_doa_plane.lu_compute_mtx_obj_initial(GtG_inv_lst, Tbeta_lst, Rc0, num_bands, K)¶
- compute the matrix (M) in the objective function:
min c^H M c s.t. c0^H c = 1
- Parameters
GtG_lst – list of G^H * G
Tbeta_lst – list of Teoplitz matrices for beta-s
Rc0 – right dual matrix for the annihilating filter (same of each block -> not a list)
- pyroomacoustics.doa.tools_fri_doa_plane.make_G(p_mic_x, p_mic_y, omega_bands, sound_speed, M, signal_type='visibility')¶
reconstruct point sources on the circle from the visibility measurements from multi-bands.
- Parameters
p_mic_x – a vector that contains microphones’ x-coordinates
p_mic_y – a vector that contains microphones’ y-coordinates
omega_bands – mid-band (ANGULAR) frequencies [radian/sec]
sound_speed – speed of sound
signal_type – The type of the signal a, possible values are ‘visibility’ for covariance matrix and ‘raw’ for microphone inputs
- Return type
The list of mapping matrices from measurements to sinusoids
- pyroomacoustics.doa.tools_fri_doa_plane.make_GtG_and_inv(G_lst)¶
- pyroomacoustics.doa.tools_fri_doa_plane.mtx_freq2raw(M, p_mic_x, p_mic_y)¶
build the matrix that maps the Fourier series to the raw microphone signals
- Parameters
M – the Fourier series expansion is limited from -M to M
p_mic_x – a vector that contains microphones x coordinates
p_mic_y – a vector that contains microphones y coordinates
- pyroomacoustics.doa.tools_fri_doa_plane.mtx_freq2visi(M, p_mic_x, p_mic_y)¶
build the matrix that maps the Fourier series to the visibility
- Parameters
M – the Fourier series expansion is limited from -M to M
p_mic_x – a vector that constains microphones x coordinates
p_mic_y – a vector that constains microphones y coordinates
- pyroomacoustics.doa.tools_fri_doa_plane.mtx_fri2signal_ri(M, p_mic_x, p_mic_y, D1, D2, signal='visibility')¶
build the matrix that maps the Fourier series to the visibility in terms of REAL-VALUED entries only. (matrix size double)
- Parameters
M – the Fourier series expansion is limited from -M to M
p_mic_x – a vector that contains microphones x coordinates
p_mic_y – a vector that contains microphones y coordinates
D1 – expansion matrix for the real-part
D2 – expansion matrix for the imaginary-part
signal – The type of signal considered (‘visibility’ for covariance matrix, ‘raw’ for microphone inputs)
- pyroomacoustics.doa.tools_fri_doa_plane.mtx_fri2signal_ri_multiband(M, p_mic_x_all, p_mic_y_all, D1, D2, aslist=False, signal='visibility')¶
build the matrix that maps the Fourier series to the visibility in terms of REAL-VALUED entries only. (matrix size double)
- Parameters
M – the Fourier series expansion is limited from -M to M
p_mic_x_all – a matrix that contains microphones x coordinates
p_mic_y_all – a matrix that contains microphones y coordinates
D1 – expansion matrix for the real-part
D2 – expansion matrix for the imaginary-part aslist: whether the linear mapping for each subband is returned as a list or a block diagonal matrix
signal – The type of signal considered (‘visibility’ for covariance matrix, ‘raw’ for microphone inputs)
- pyroomacoustics.doa.tools_fri_doa_plane.mtx_updated_G(phi_recon, M, mtx_amp2visi_ri, mtx_fri2visi_ri)¶
Update the linear transformation matrix that links the FRI sequence to the visibilities by using the reconstructed Dirac locations. :param phi_recon: the reconstructed Dirac locations (azimuths) :param M: the Fourier series expansion is between -M to M :param p_mic_x: a vector that contains microphones’ x-coordinates :param p_mic_y: a vector that contains microphones’ y-coordinates :param mtx_freq2visi: the linear mapping from Fourier series to visibilities
- pyroomacoustics.doa.tools_fri_doa_plane.mtx_updated_G_multiband(phi_recon, M, mtx_amp2visi_ri, mtx_fri2visi_ri, num_bands)¶
Update the linear transformation matrix that links the FRI sequence to the visibilities by using the reconstructed Dirac locations.
- Parameters
phi_recon – the reconstructed Dirac locations (azimuths)
M – the Fourier series expansion is between -M to M
p_mic_x – a vector that contains microphones’ x-coordinates
p_mic_y – a vector that contains microphones’ y-coordinates
mtx_fri2visi – the linear mapping from Fourier series to visibilities
- pyroomacoustics.doa.tools_fri_doa_plane.mtx_updated_G_multiband_new(phi_opt, M, p_x, p_y, G0_lst, num_bands)¶
Update the linear transformation matrix that links the FRI sequence to the visibilities by using the reconstructed Dirac locations.
- Parameters
phi_opt – the reconstructed Dirac locations (azimuths)
M – the Fourier series expansion is between -M to M
p_mic_x – a vector that contains microphones’ x-coordinates
p_mic_y – a vector that contains microphones’ y-coordinates
G0_lst – the original linear mapping from Fourier series to visibilities
num_bands – number of subbands
- pyroomacoustics.doa.tools_fri_doa_plane.multiband_cov_mtx_est(y_mic)¶
estimate covariance matrix based on the received signals at microphones
- Parameters
y_mic – received signal (complex base-band representation) at microphones
- pyroomacoustics.doa.tools_fri_doa_plane.multiband_extract_off_diag(mtx)¶
extract off-diagonal entries in mtx The output vector is order in a column major manner
- Parameters
mtx (input matrix to extract the off-diagonal entries) –
- pyroomacoustics.doa.tools_fri_doa_plane.output_shrink(K, L)¶
shrink the convolution output to half the size. used when both the annihilating filter and the uniform samples of sinusoids satisfy Hermitian symmetric.
- Parameters
K – the annihilating filter size: K + 1
L – length of the (complex-valued) b vector
- pyroomacoustics.doa.tools_fri_doa_plane.polar2cart(rho, phi)¶
convert from polar to cartesian coordinates
- Parameters
rho – radius
phi – azimuth
- pyroomacoustics.doa.tools_fri_doa_plane.pt_src_recon(a, p_mic_x, p_mic_y, omega_band, sound_speed, K, M, noise_level, max_ini=50, stop_cri='mse', update_G=False, verbose=False, signal_type='visibility', **kwargs)¶
reconstruct point sources on the circle from the visibility measurements
- Parameters
a – the measured visibilities
p_mic_x – a vector that contains microphones’ x-coordinates
p_mic_y – a vector that contains microphones’ y-coordinates
omega_band – mid-band (ANGULAR) frequency [radian/sec]
sound_speed – speed of sound
K – number of point sources
M – the Fourier series expansion is between -M to M
noise_level – noise level in the measured visibilities
max_ini – maximum number of random initialisation used
stop_cri – either ‘mse’ or ‘max_iter’
update_G – update the linear mapping that links the uniformly sampled sinusoids to the visibility or not.
verbose – whether output intermediate results for debugging or not
signal_type – The type of the signal a, possible values are ‘visibility’ for covariance matrix and ‘raw’ for microphone inputs
kwargs (possible optional input: G_iter: number of iterations for the G updates) –
- pyroomacoustics.doa.tools_fri_doa_plane.pt_src_recon_multiband(a, p_mic_x, p_mic_y, omega_bands, sound_speed, K, M, noise_level, max_ini=50, update_G=False, verbose=False, signal_type='visibility', max_iter=50, G_lst=None, GtG_lst=None, GtG_inv_lst=None, **kwargs)¶
reconstruct point sources on the circle from the visibility measurements from multi-bands.
- Parameters
a – the measured visibilities in a matrix form, where the second dimension corresponds to different subbands
p_mic_x – a vector that contains microphones’ x-coordinates
p_mic_y – a vector that contains microphones’ y-coordinates
omega_bands – mid-band (ANGULAR) frequencies [radian/sec]
sound_speed – speed of sound
K – number of point sources
M – the Fourier series expansion is between -M to M
noise_level – noise level in the measured visibilities
max_ini – maximum number of random initialisation used
update_G – update the linear mapping that links the uniformly sampled sinusoids to the visibility or not.
verbose – whether output intermediate results for debugging or not
signal_type – The type of the signal a, possible values are ‘visibility’ for covariance matrix and ‘raw’ for microphone inputs
kwargs – possible optional input: G_iter: number of iterations for the G updates
- pyroomacoustics.doa.tools_fri_doa_plane.pt_src_recon_rotate(a, p_mic_x, p_mic_y, K, M, noise_level, max_ini=50, stop_cri='mse', update_G=False, num_rotation=1, verbose=False, signal_type='visibility', **kwargs)¶
reconstruct point sources on the circle from the visibility measurements. Here we apply random rotations to the coordiantes.
- Parameters
a – the measured visibilities
p_mic_x – a vector that contains microphones’ x-coordinates
p_mic_y – a vector that contains microphones’ y-coordinates
K – number of point sources
M – the Fourier series expansion is between -M to M
noise_level – noise level in the measured visibilities
max_ini – maximum number of random initialisation used
stop_cri – either ‘mse’ or ‘max_iter’
update_G – update the linear mapping that links the uniformly sampled sinusoids to the visibility or not.
num_rotation – number of random rotations
verbose – whether output intermediate results for debugging or not
signal_type – The type of the signal a, possible values are ‘visibility’ for covariance matrix and ‘raw’ for microphone inputs
kwargs – possible optional input: G_iter: number of iterations for the G updates
Grid Objects¶
Routines to perform grid search on the sphere
- class pyroomacoustics.doa.grid.Grid(n_points)¶
Bases:
object
This is an abstract class with attributes and methods for grids
- Parameters
n_points (int) – the number of points on the grid
- abstract apply(func, spherical=False)¶
- abstract find_peaks(k=1)¶
- set_values(vals)¶
- class pyroomacoustics.doa.grid.GridCircle(n_points=360, azimuth=None)¶
Bases:
Grid
Creates a grid on the circle.
- Parameters
n_points (int, optional) – The number of uniformly spaced points in the grid.
azimuth (ndarray, optional) – An array of azimuth (in radians) to use for grid locations. Overrides n_points.
- apply(func, spherical=False)¶
- find_peaks(k=1)¶
- plot(mark_peaks=0)¶
- class pyroomacoustics.doa.grid.GridSphere(n_points=1000, spherical_points=None)¶
Bases:
Grid
This function computes nearly equidistant points on the sphere using the fibonacci method
- Parameters
n_points (int) – The number of points to sample
spherical_points (ndarray, optional) – A 2 x n_points array of spherical coordinates with azimuth in the top row and colatitude in the second row. Overrides n_points.
References
http://lgdv.cs.fau.de/uploads/publications/spherical_fibonacci_mapping.pdf http://stackoverflow.com/questions/9600801/evenly-distributing-n-points-on-a-sphere
- apply(func, spherical=False)¶
Apply a function to every grid point
- find_peaks(k=1)¶
Find the largest peaks on the grid
- min_max_distance()¶
Compute some statistics on the distribution of the points
- plot(colatitude_ref=None, azimuth_ref=None, colatitude_recon=None, azimuth_recon=None, plotly=True, projection=True, points_only=False)¶
- plot_old(plot_points=False, mark_peaks=0)¶
Plot the points on the sphere with their values
- regrid()¶
Regrid the non-uniform data on a regular mesh
Plot Helpers¶
A collection of functions to plot maps and points on circles and spheres.
- pyroomacoustics.doa.plotters.polar_plt_dirac(self, azimuth_ref=None, alpha_ref=None, save_fig=False, file_name=None, plt_dirty_img=True)¶
Generate polar plot of DoA results.
- Parameters
azimuth_ref (numpy array) – True direction of sources (in radians).
alpha_ref (numpy array) – Estimated amplitude of sources.
save_fig (bool) – Whether or not to save figure as pdf.
file_name (str) – Name of file (if saved). Default is ‘polar_recon_dirac.pdf’
plt_dirty_img (bool) – Whether or not to plot spatial spectrum or ‘dirty image’ in the case of FRI.
- pyroomacoustics.doa.plotters.sph_plot_diracs(colatitude_ref=None, azimuth_ref=None, colatitude=None, azimuth=None, dirty_img=None, colatitude_grid=None, azimuth_grid=None, file_name='sph_recon_2d_dirac.pdf', **kwargs)¶
This function plots the dirty image with sources locations on a flat projection of the sphere
- Parameters
colatitude_ref (ndarray, optional) – The colatitudes of a collection of reference points
azimuths_ref (ndarray, optional) – The azimuths of a collection of reference points for the Diracs
colatitude (ndarray, optional) – The colatitudes of the collection of points to visualize
azimuth (ndarray, optional) – The azimuths of the collection of points to visualize
dirty_img (ndarray) – A 2D map for displaying a pattern on the sphere under the points
azimuth_grid (ndarray) – The azimuths indexing the dirty_img 2D map
colatitude_grid (ndarray) – The colatitudes indexing the dirty_img 2D map
- pyroomacoustics.doa.plotters.sph_plot_diracs_plotly(colatitude_ref=None, azimuth_ref=None, colatitude=None, azimuth=None, dirty_img=None, azimuth_grid=None, colatitude_grid=None, surface_base=1, surface_height=0.0)¶
Plots a 2D map on a sphere as well as a collection of diracs using the plotly library
- Parameters
colatitude_ref (ndarray, optional) – The colatitudes of a collection of reference points
azimuths_ref (ndarray, optional) – The azimuths of a collection of reference points for the Diracs
colatitude (ndarray, optional) – The colatitudes of the collection of points to visualize
azimuth (ndarray, optional) – The azimuths of the collection of points to visualize
dirty_img (ndarray) – A 2D map for displaying a pattern on the sphere under the points
azimuth_grid (ndarray) – The azimuths indexing the dirty_img 2D map
colatitude_grid (ndarray) – The colatitudes indexing the dirty_img 2D map
surface_base – radius corresponding to lowest height on the map
sufrace_height – radius difference between the lowest and highest point on the map
Peak Detection¶
Detect peaks in data based on their amplitude and other features.
Author: Marcos Duarte, https://github.com/demotu/BMC Version: 1.0.4 License: MIT
- pyroomacoustics.doa.detect_peaks.detect_peaks(x, mph=None, mpd=1, threshold=0, edge='rising', kpsh=False, valley=False, show=False, ax=None)¶
Detect peaks in data based on their amplitude and other features.
- Parameters
x (1D array_like) – data.
mph ({None, number}, optional (default = None)) – detect peaks that are greater than minimum peak height.
mpd (positive integer, optional (default = 1)) – detect peaks that are at least separated by minimum peak distance (in number of data).
threshold (positive number, optional (default = 0)) – detect peaks (valleys) that are greater (smaller) than threshold in relation to their immediate neighbors.
edge ({None, 'rising', 'falling', 'both'}, optional (default = 'rising')) – for a flat peak, keep only the rising edge (‘rising’), only the falling edge (‘falling’), both edges (‘both’), or don’t detect a flat peak (None).
kpsh (bool, optional (default = False)) – keep peaks with same height even if they are closer than mpd.
valley (bool, optional (default = False)) – if True (1), detect valleys (local minima) instead of peaks.
show (bool, optional (default = False)) – if True (1), plot data in matplotlib figure.
ax (a matplotlib.axes.Axes instance, optional (default = None).) –
- Returns
ind – indeces of the peaks in x.
- Return type
1D array_like
Notes
The detection of valleys instead of peaks is performed internally by simply negating the data: ind_valleys = detect_peaks(-x)
The function can handle NaN’s
See this IPython Notebook 1.
References
Examples
>>> from detect_peaks import detect_peaks >>> x = np.random.randn(100) >>> x[60:81] = np.nan >>> # detect all peaks and plot data >>> ind = detect_peaks(x, show=True) >>> print(ind)
>>> x = np.sin(2*np.pi*5*np.linspace(0, 1, 200)) + np.random.randn(200)/5 >>> # set minimum peak height = 0 and minimum peak distance = 20 >>> detect_peaks(x, mph=0, mpd=20, show=True)
>>> x = [0, 1, 0, 2, 0, 3, 0, 2, 0, 1, 0] >>> # set minimum peak distance = 2 >>> detect_peaks(x, mpd=2, show=True)
>>> x = np.sin(2*np.pi*5*np.linspace(0, 1, 200)) + np.random.randn(200)/5 >>> # detection of valleys instead of peaks >>> detect_peaks(x, mph=0, mpd=20, valley=True, show=True)
>>> x = [0, 1, 1, 0, 1, 1, 0] >>> # detect both edges >>> detect_peaks(x, edge='both', show=True)
>>> x = [-2, 1, -2, 2, 1, 1, 3, 0] >>> # set threshold = 2 >>> detect_peaks(x, threshold = 2, show=True)
DOA Utilities¶
This module contains useful functions to compute distances and errors on on circles and spheres.
- pyroomacoustics.doa.utils.cart2spher(vectors)¶
- Parameters
vectors (array_like, shape (3, n_vectors)) – The vectors to transform
- Returns
azimuth (numpy.ndarray, shape (n_vectors,)) – The azimuth of the vectors
colatitude (numpy.ndarray, shape (n_vectors,)) – The colatitude of the vectors
r (numpy.ndarray, shape (n_vectors,)) – The length of the vectors
- pyroomacoustics.doa.utils.circ_dist(azimuth1, azimuth2, r=1.0)¶
Returns the shortest distance between two points on a circle
- Parameters
azimuth1 – azimuth of point 1
azimuth2 – azimuth of point 2
r (optional) – radius of the circle (Default 1)
- pyroomacoustics.doa.utils.great_circ_dist(r, colatitude1, azimuth1, colatitude2, azimuth2)¶
calculate great circle distance for points located on a sphere
- Parameters
r (radius of the sphere) –
colatitude1 (colatitude of point 1) –
azimuth1 (azimuth of point 1) –
colatitude2 (colatitude of point 2) –
azimuth2 (azimuth of point 2) –
- Returns
great-circle distance
- Return type
float or ndarray
- pyroomacoustics.doa.utils.polar_distance(x1, x2)¶
Given two arrays of numbers x1 and x2, pairs the cells that are the closest and provides the pairing matrix index: x1(index(1,:)) should be as close as possible to x2(index(2,:)). The function outputs the average of the absolute value of the differences abs(x1(index(1,:))-x2(index(2,:))).
- Parameters
x1 – vector 1
x2 – vector 2
- Returns
d – minimum distance between d
index – the permutation matrix
- pyroomacoustics.doa.utils.spher2cart(azimuth, colatitude=None, r=1, degrees=False)¶
Convert a spherical point to cartesian coordinates.
- Parameters
azimuth – azimuth
colatitude – colatitude
r – radius
degrees – Returns values in degrees instead of radians if set to
True
- Returns
An ndarray containing the Cartesian coordinates of the points as its columns.
- Return type
ndarray
Single Channel Denoising¶
Module contents¶
Single Channel Noise Reduction¶
Collection of single channel noise reduction (SCNR) algorithms for speech:
At this repository, a deep learning approach in Python can be found.
References
- 1
M. Berouti, R. Schwartz, and J. Makhoul, Enhancement of speech corrupted by acoustic noise, ICASSP ‘79. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1979, pp. 208-211.
- 2
Y. Ephraim and H. L. Van Trees, A signal subspace approach for speech enhancement, IEEE Transactions on Speech and Audio Processing, vol. 3, no. 4, pp. 251-266, Jul 1995.
- 3
J. Lim and A. Oppenheim, All-Pole Modeling of Degraded Speech, IEEE Transactions on Acoustics, Speech, and Signal Processing 26.3 (1978): 197-210.
Algorithms¶
pyroomacoustics.denoise.spectral_subtraction module¶
- class pyroomacoustics.denoise.spectral_subtraction.SpectralSub(nfft, db_reduc, lookback, beta, alpha=1)¶
Bases:
object
Here we have a class for performing single channel noise reduction via spectral subtraction. The instantaneous signal energy and noise floor is estimated at each time instance (for each frequency bin) and this is used to compute a gain filter with which to perform spectral subtraction.
For a given frame n, the gain for frequency bin k is given by:
\[\begin{split}G[k, n] = \max \\left \{ \\left ( \dfrac{P[k, n]-\\beta P_N[k, n]}{P[k, n]} \\right )^\\alpha, G_{min} \\right \},\end{split}\]where \(G_{min} = 10^{-(db\_reduc/20)}\) and \(db\_reduc\) is the maximum reduction (in dB) that we are willing to perform for each bin (a high value can actually be detrimental, see below). The instantaneous energy \(P[k,n]\) is computed by simply squaring the frequency amplitude at the bin k. The time-frequency decomposition of the input signal is typically done with the STFT and overlapping frames. The noise estimate \(P_N[k, n]\) for frequency bin k is given by looking back a certain number of frames \(L\) and selecting the bin with the lowest energy:
\[P_N[k, n] = \min_{[n-L, n]} P[k, n]\]This approach works best when the SNR is positive and the noise is rather stationary. An alternative approach for the noise estimate (also in the case of stationary noise) would be to apply a lowpass filter for each frequency bin.
With a large suppression, i.e. large values for \(db\_reduc\), we can observe a typical artefact of such spectral subtraction approaches, namely “musical noise”. Here is nice article about noise reduction and musical noise.
Adjusting the constants \(\\beta\) and \(\\alpha\) also presents a trade-off between suppression and undesirable artefacts, i.e. more noticeable musical noise.
Below is an example of how to use this class to emulate a streaming/online input. A full example can be found here.
# initialize STFT and SpectralSub objects nfft = 512 stft = pra.transform.STFT(nfft, hop=nfft//2, analysis_window=pra.hann(nfft)) scnr = pra.denoise.SpectralSub(nfft, db_reduc=10, lookback=5, beta=20, alpha=3) # apply block-by-block for n in range(num_blocks): # go to frequency domain for noise reduction stft.analysis(mono_noisy) gain_filt = scnr.compute_gain_filter(stft.X) # estimating input convolved with unknown response mono_denoised = stft.synthesis(gain_filt*stft.X)
There also exists a “one-shot” function.
# import or create `noisy_signal` denoised_signal = apply_spectral_sub(noisy_signal, nfft=512, db_reduc=10, lookback=5, beta=20, alpha=3)
- Parameters
nfft (int) – FFT size. Length of gain filter, i.e. the number of frequency bins, is given by
nfft//2+1
.db_reduc (float) – Maximum reduction in dB for each bin.
lookback (int) – How many frames to look back for the noise estimate.
beta (float) – Overestimation factor to “push” the gain filter value (at each frequency) closer to the dB reduction specified by
db_reduc
.alpha (float, optional) – Exponent factor to modify transition behavior towards the dB reduction specified by
db_reduc
. Default is 1.
- compute_gain_filter(X)¶
- Parameters
X (numpy array) – Complex spectrum of length
nfft//2+1
.- Returns
Gain filter to multiply given spectrum with.
- Return type
numpy array
- pyroomacoustics.denoise.spectral_subtraction.apply_spectral_sub(noisy_signal, nfft=512, db_reduc=25, lookback=12, beta=30, alpha=1)¶
One-shot function to apply spectral subtraction approach.
- Parameters
noisy_signal (numpy array) – Real signal in time domain.
nfft (int) – FFT size. Length of gain filter, i.e. the number of frequency bins, is given by
nfft//2+1
.db_reduc (float) – Maximum reduction in dB for each bin.
lookback (int) – How many frames to look back for the noise estimate.
beta (float) – Overestimation factor to “push” the gain filter value (at each frequency) closer to the dB reduction specified by
db_reduc
.alpha (float, optional) – Exponent factor to modify transition behavior towards the dB reduction specified by
db_reduc
. Default is 1.
- Returns
Enhanced/denoised signal.
- Return type
numpy array
pyroomacoustics.denoise.subspace module¶
- class pyroomacoustics.denoise.subspace.Subspace(frame_len=256, mu=10, lookback=10, skip=2, thresh=0.01, data_type='float32')¶
Bases:
object
A class for performing single channel noise reduction in the time domain via the subspace approach. This implementation is based off of the approach presented in:
Y. Ephraim and H. L. Van Trees, A signal subspace approach for speech enhancement, IEEE Transactions on Speech and Audio Processing, vol. 3, no. 4, pp. 251-266, Jul 1995.
Moreover, an adaptation of the subspace approach is implemented here, as presented in:
Y. Hu and P. C. Loizou, A subspace approach for enhancing speech corrupted by colored noise, IEEE Signal Processing Letters, vol. 9, no. 7, pp. 204-206, Jul 2002.
Namely, an eigendecomposition is performed on the matrix:
\[\Sigma = R_n^{-1} R_y - I,\]where \(R_n\) is the noise covariance matrix, \(R_y\) is the covariance matrix of the input noisy signal, and \(I\) is the identity matrix. The covariance matrices are estimated from past samples; the number of past samples/frames used for estimation can be set with the parameters lookback and skip. A simple energy threshold (thresh parameter) is used to identify noisy frames.
The eigenvectors corresponding to the positive eigenvalues of \(\Sigma\) are used to create a linear operator \(H_{opt}\) in order to enhance the noisy input signal \(\mathbf{y}\):
\[\mathbf{\hat{x}} = H_{opt} \cdot \mathbf{y}.\]The length of \(\mathbf{y}\) is specified by the parameter frame_len; \(50\%\) overlap and add with a Hanning window is used to reconstruct the output. A great summary of the approach can be found in the paper by Y. Hu and P. C. Loizou under Section III, B.
Adjusting the factor \(\\mu\) presents a trade-off between noise suppression and distortion.
Below is an example of how to use this class to emulate a streaming/online input. A full example can be found here. Depending on your choice for frame_len, lookback, and skip, the approach may not be suitable for real-time processing.
# initialize Subspace object scnr = Subspace(frame_len=256, mu=10, lookback=10, skip=2, thresh=0.01) # apply block-by-block for n in range(num_blocks): denoised_signal = scnr.apply(noisy_input)
There also exists a “one-shot” function.
# import or create `noisy_signal` denoised_signal = apply_subspace(noisy_signal, frame_len=256, mu=10, lookback=10, skip=2, thresh=0.01)
- Parameters
frame_len (int) – Frame length in samples. Note that large values (above 256) will make the eigendecompositions very expensive.
mu (float) – Enhancement factor, larger values suppress more noise but could lead to more distortion.
lookback (int) – How many frames to look back for covariance matrix estimation.
skip (int) – How many samples to skip when estimating the covariance matrices with past samples. skip=1 will use all possible frames in the estimation.
thresh (float) – Threshold to distinguish between (signal+noise) and (noise) frames. A high value will classify more frames as noise but might remove desired signal!
data_type ('float32' or 'float64') – Data type to use in the enhancement procedure. Default is ‘float32’.
- apply(new_samples)¶
- Parameters
new_samples (numpy array) – New array of samples of length self.hop in the time domain.
- Returns
Denoised samples.
- Return type
numpy array
- compute_signal_projection()¶
- update_cov_matrices(new_samples)¶
- pyroomacoustics.denoise.subspace.apply_subspace(noisy_signal, frame_len=256, mu=10, lookback=10, skip=2, thresh=0.01, data_type=<MagicMock id='140189070844560'>)¶
One-shot function to apply subspace denoising approach.
- Parameters
noisy_signal (numpy array) – Real signal in time domain.
frame_len (int) – Frame length in samples. Note that large values (above 256) will make the eigendecompositions very expensive. 50% overlap is used with hanning window.
mu (float) – Enhancement factor, larger values suppress more noise but could lead to more distortion.
lookback (int) – How many frames to look back for covariance matrix estimation.
skip (int) – How many samples to skip when estimating the covariance matrices with past samples. skip=1 will use all possible frames in the estimation.
thresh (float) – Threshold to distinguish between (signal+noise) and (noise) frames. A high value will classify more frames as noise and might even remove desired signal!
data_type ('float32' or 'float64') – Data type to use in the enhancement procedure. Default is ‘float32’.
- Returns
Enhanced/denoised signal.
- Return type
numpy array
pyroomacoustics.denoise.iterative_wiener module¶
- class pyroomacoustics.denoise.iterative_wiener.IterativeWiener(frame_len, lpc_order, iterations, alpha=0.8, thresh=0.01)¶
Bases:
object
A class for performing single channel noise reduction in the frequency domain with a Wiener filter that is iteratively computed. This implementation is based off of the approach presented in:
J. Lim and A. Oppenheim, All-Pole Modeling of Degraded Speech, IEEE Transactions on Acoustics, Speech, and Signal Processing 26.3 (1978): 197-210.
For each frame, a Wiener filter of the following form is computed and applied to the noisy samples in the frequency domain:
\[\begin{split}H(\\omega) = \dfrac{P_S(\\omega)}{P_S(\\omega) + \\sigma_d^2},\end{split}\]where \(P_S(\omega)\) is the speech power spectral density and \(\sigma_d^2\) is the noise variance.
The following assumptions are made in order to arrive at the above filter as the optimal solution:
The noisy samples \(y[n]\) can be written as:
\[y[n] = s[n] + d[n],\]where \(s[n]\) is the desired signal and \(d[n]\) is the background noise.
The signal and noise are uncorrelated.
The noise is white Gaussian, i.e. it has a flat power spectrum with amplitude \(\sigma_d^2\).
Under these assumptions, the above Wiener filter minimizes the mean-square error between the true samples \(s[0:N-1]\) and the estimated one \(\hat{s[0:N-1]}\) by filtering \(y[0:N-1]\) with the above filter (with \(N\) being the frame length).
The fundamental part of this approach is correctly (or as well as possible) estimating the speech power spectral density \(P_S(\omega)\) and the noise variance \(\sigma_d^2\). For this, we need a voice activity detector in order to determine when we have incoming speech. In this implementation, we use a simple energy threshold on the input frame, which is set with the thresh input parameter.
When no speech is identified, the input frame is used to update the noise variance \(\sigma_d^2\). We could simply set \(\sigma_d^2\) to the energy of the input frame. However, we employ a simple IIR filter in order to avoid abrupt changes in the noise level (thus adding an assumption of stationary):
\[\begin{split}\sigma_d^2[k] = \\alpha \cdot \sigma_d^2[k-1] + (1-\\alpha) \cdot \sigma_y^2,\end{split}\]where \(\\alpha\) is the smoothing parameter and \(\sigma_y^2\) is the energy of the input frame. A high value of \(\\alpha\) will update the noise level very slowly, while a low value will make it very sensitive to changes at the input. The value for \(\\alpha\) can be set with the alpha parameter.
When speech is identified in the input frame, an iterative procedure is employed in order to estimate \(P_S(\omega)\) (and therefore the Wiener filter \(H\) as well). This procedure consists of computing \(p\) linear predictive coding (LPC) coefficients of the input frame. The number of LPC coefficients is set with the parameter lpc_order. These LPC coefficients form an all-pole filter that models the vocal tract as described in the above paper (Eq. 1). With these coefficients, we can then obtain an estimate of the speech power spectral density (Eq. 41b) and thus the corresponding Wiener filter (Eq. 41a). This Wiener filter is used to denoise the input frame. Moreover, with this denoised frame, we can compute new LPC coefficients and therefore a new Wiener filter. The idea behind this approach is that by iteratively computing the LPC coefficients as such, we can obtain a better estimate of the speech power spectral density. The number of iterations can be set with the iterations parameter.
Below is an example of how to use this class to emulate a streaming/online input. A full example can be found here.
# initialize STFT and IterativeWiener objects nfft = 512 stft = pra.transform.STFT(nfft, hop=nfft//2, analysis_window=pra.hann(nfft)) scnr = IterativeWiener(frame_len=nfft, lpc_order=20, iterations=2, alpha=0.8, thresh=0.01) # apply block-by-block for n in range(num_blocks): # go to frequency domain, 50% overlap stft.analysis(mono_noisy) # compute wiener output X = scnr.compute_filtered_output( current_frame=stft.fft_in_buffer, frame_dft=stft.X) # back to time domain mono_denoised = stft.synthesis(X)
There also exists a “one-shot” function.
# import or create `noisy_signal` denoised_signal = apply_iterative_wiener(noisy_signal, frame_len=512, lpc_order=20, iterations=2, alpha=0.8, thresh=0.01)
- Parameters
frame_len (int) – Frame length in samples.
lpc_order (int) – Number of LPC coefficients to compute
iterations (int) – How many iterations to perform in updating the Wiener filter for each signal frame.
alpha (int) – Smoothing factor within [0,1] for updating noise level. Closer to 1 gives more weight to the previous noise level, while closer to 0 gives more weight to the current frame’s level. Closer to 0 can track more rapid changes in the noise level. However, if a speech frame is incorrectly identified as noise, you can end up removing desired speech.
thresh (float) – Threshold to distinguish between (signal+noise) and (noise) frames. A high value will classify more frames as noise but might remove desired signal!
- compute_filtered_output(current_frame, frame_dft=None)¶
Compute Wiener filter in the frequency domain.
- Parameters
current_frame (numpy array) – Noisy samples.
frame_dft (numpy array) – DFT of input samples. If not provided, it will be computed.
- Returns
Output of denoising in the frequency domain.
- Return type
numpy array
- pyroomacoustics.denoise.iterative_wiener.apply_iterative_wiener(noisy_signal, frame_len=512, lpc_order=20, iterations=2, alpha=0.8, thresh=0.01)¶
One-shot function to apply iterative Wiener filtering for denoising.
- Parameters
noisy_signal (numpy array) – Real signal in time domain.
frame_len (int) – Frame length in samples. 50% overlap is used with hanning window.
lpc_order (int) – Number of LPC coefficients to compute
iterations (int) – How many iterations to perform in updating the Wiener filter for each signal frame.
alpha (int) – Smoothing factor within [0,1] for updating noise level. Closer to 1 gives more weight to the previous noise level, while closer to 0 gives more weight to the current frame’s level. Closer to 0 can track more rapid changes in the noise level. However, if a speech frame is incorrectly identified as noise, you can end up removing desired speech.
thresh (float) – Threshold to distinguish between (signal+noise) and (noise) frames. A high value will classify more frames as noise but might remove desired signal!
- Returns
Enhanced/denoised signal.
- Return type
numpy array
- pyroomacoustics.denoise.iterative_wiener.compute_speech_psd(a, g2, nfft)¶
Compute power spectral density of speech as specified in equation (41b) of
J. Lim and A. Oppenheim, All-Pole Modeling of Degraded Speech, IEEE Transactions on Acoustics, Speech, and Signal Processing 26.3 (1978): 197-210.
Namely:
\[\begin{split}P_S(\\omega) = \dfrac{g^2}{\\left \| 1 - \sum_{k=1}^p a_k \cdot e^{-jk\omega} \\right \|^2},\end{split}\]where \(p\) is the LPC order, \(a_k\) are the LPC coefficients, and \(g\) is an estimated gain factor.
The power spectral density is computed at the frequencies corresponding to a DFT of length nfft.
- Parameters
a (numpy array) – LPC coefficients.
g2 (float) – Squared gain.
nfft (int) – FFT length.
- Returns
Power spectral density from LPC coefficients.
- Return type
numpy array
- pyroomacoustics.denoise.iterative_wiener.compute_squared_gain(a, noise_psd, y)¶
Estimate the squared gain of the speech power spectral density as done on p. 204 of
J. Lim and A. Oppenheim, All-Pole Modeling of Degraded Speech, IEEE Transactions on Acoustics, Speech, and Signal Processing 26.3 (1978): 197-210.
Namely solving for \(g^2\) such that the following expression is satisfied:
\[\begin{split}\dfrac{N}{2\pi} \int_{-\pi}^{\pi} \dfrac{g^2}{\\left \| 1 - \sum_{k=1}^p a_k \cdot e^{-jk\omega} \\right \|^2} d\omega = \sum_{n=0}^{N-1} y^2(n) - N\cdot\sigma_d^2,\end{split}\]where \(N\) is the number of noisy samples \(y\), \(a_k\) are the \(p\) LPC coefficients, and \(\sigma_d^2\) is the noise variance.
- Parameters
a (numpy array) – LPC coefficients.
noise_psd (float or numpy array) – Noise variance if white noise, numpy array otherwise.
y (numpy array) – Noisy time domain samples.
- Returns
Squared gain.
- Return type
float
- pyroomacoustics.denoise.iterative_wiener.compute_wiener_filter(speech_psd, noise_psd)¶
Compute Wiener filter in the frequency domain.
- Parameters
speech_psd (numpy array) – Speech power spectral density.
noise_psd (float or numpy array) – Noise variance if white noise, numpy array otherwise.
- Returns
Frequency domain filter, computed at the same frequency values as speech_psd.
- Return type
numpy array
Phase Processing¶
Submodules¶
Griffin-Lim Phase Reconstruction¶
Implementation of the classic phase reconstruction from Griffin and Lim 1. The input to the algorithm is the magnitude from STFT measurements.
The algorithm works by starting by assigning a (possibly random) initial phase to the measurements, and then iteratively
Reconstruct the time-domain signal
Re-apply STFT
Enforce the known magnitude of the measurements
The implementation supports different types of initialization via the keyword argument ini
.
If omitted, the initial phase is uniformly zero
If
ini="random"
, a random phase is usedIf
ini=A
for anumpy.ndarray
of the same shape as the input magnitude,A / numpy.abs(A)
is used for initialization
Example
import numpy as np
from scipy.io import wavfile
import pyroomacoustics as pra
# We open a speech sample
filename = "examples/input_samples/cmu_arctic_us_axb_a0004.wav"
fs, audio = wavfile.read(filename)
# These are the parameters of the STFT
fft_size = 512
hop = fft_size // 4
win_a = np.hamming(fft_size)
win_s = pra.transform.stft.compute_synthesis_window(win_a, hop)
n_iter = 200
engine = pra.transform.STFT(
fft_size, hop=hop, analysis_window=win_a, synthesis_window=win_s
)
X = engine.analysis(audio)
X_mag = np.abs(X)
X_mag_norm = np.linalg.norm(X_mag) ** 2
# monitor convergence
errors = []
# the callback to track the spectral distance convergence
def cb(epoch, Y, y):
# we measure convergence via spectral distance
Y_2 = engine.analysis(y)
sd = np.linalg.norm(X_mag - np.abs(Y_2)) ** 2 / X_mag_norm
# save in the list every 10 iterations
if epoch % 10 == 0:
errors.append(sd)
pra.phase.griffin_lim(X_mag, hop, win_a, n_iter=n_iter, callback=cb)
plt.semilogy(np.arange(len(errors)) * 10, errors)
plt.show()
References
- 1
D. Griffin and J. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, no. 2, pp. 236–243, 1984.
- pyroomacoustics.phase.gl.griffin_lim(X, hop, analysis_window, fft_size=None, stft_kwargs={}, n_iter=100, ini=None, callback=None)¶
Implementation of the Griffin-Lim phase reconstruction algorithm from STFT magnitude measurements.
- Parameters
X (array_like, shape (n_frames, n_freq)) – The STFT magnitude measurements
hop (int) – The frame shift of the STFT
analysis_window (array_like, shape (fft_size,)) – The window used for the STFT analysis
fft_size (int, optional) – The FFT size for the STFT, if omitted it is computed from the dimension of
X
stft_kwargs (dict, optional) – Dictionary of extra parameters for the STFT
n_iter (int, optional) – The number of iteration
ini (str or array_like, np.complex, shape (n_frames, n_freq), optional) – The initial value of the phase estimate. If “random”, uses a random guess. If
None
, uses0
phase.callback (func, optional) – A callable taking as argument an int and the reconstructed STFT and time-domain signals
Module contents¶
Phase Processing¶
This sub-package contains algorithms related to phase specific process, such as phase reconstruction.
- Griffin-Lim
- Phase reconstruction by fixed-point iterations 1
References
- 1
D. Griffin and J. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, no. 2, pp. 236–243, 1984.
Pyroomacoustics API¶
Subpackages¶
pyroomacoustics.experimental package¶
Submodules¶
pyroomacoustics.experimental.deconvolution module¶
- pyroomacoustics.experimental.deconvolution.deconvolve(y, s, length=None, thresh=0.0)¶
Deconvolve an excitation signal from an impulse response
- Parameters
y (ndarray) – The recording
s (ndarray) – The excitation signal
length (int, optional) – the length of the impulse response to deconvolve
thresh (float, optional) – ignore frequency bins with power lower than this
- pyroomacoustics.experimental.deconvolution.wiener_deconvolve(y, x, length=None, noise_variance=1.0, let_n_points=15, let_div_base=2)¶
Deconvolve an excitation signal from an impulse response
We use Wiener filter
- Parameters
y (ndarray) – The recording
x (ndarray) – The excitation signal
length (int, optional) – the length of the impulse response to deconvolve
noise_variance (float, optional) – estimate of the noise variance
let_n_points (int) – number of points to use in the LET approximation
let_div_base (float) – the divider used for the LET grid
pyroomacoustics.experimental.delay_calibration module¶
- class pyroomacoustics.experimental.delay_calibration.DelayCalibration(fs, pad_time=0.0, mls_bits=16, repeat=1, temperature=25.0, humidity=50.0, pressure=1000.0)¶
Bases:
object
- run(distance=0.0, ch_in=None, ch_out=None, oversampling=1)¶
Run the calibration. Plays a maximum length sequence and cross correlate the signals to find the time delay.
- Parameters
distance (float, optional) – Distance between the speaker and microphone
ch_in (int, optional) – The input channel to use. If not specified, all channels are calibrated
ch_out (int, optional) – The output channel to use. If not specified, all channels are calibrated
pyroomacoustics.experimental.localization module¶
- pyroomacoustics.experimental.localization.edm_line_search(R, tdoa, bounds, steps)¶
We have a number of points of know locations and have the TDOA measurements from an unknown location to the known point. We perform an EDM line search to find the unknown offset to turn TDOA to TOA.
- Parameters
R (ndarray) – An ndarray of 3xN where each column is the location of a point
tdoa (ndarray) – A length N vector containing the tdoa measurements from uknown location to known ones
bounds (ndarray) – Bounds for the line search
step (float) – Step size for the line search
- pyroomacoustics.experimental.localization.tdoa(x1, x2, interp=1, fs=1, phat=True)¶
This function computes the time difference of arrival (TDOA) of the signal at the two microphones. This in turns is used to infer the direction of arrival (DOA) of the signal.
Specifically if s(k) is the signal at the reference microphone and s_2(k) at the second microphone, then for signal arriving with DOA theta we have
s_2(k) = s(k - tau)
with
tau = fs*d*sin(theta)/c
where d is the distance between the two microphones and c the speed of sound.
We recover tau using the Generalized Cross Correlation - Phase Transform (GCC-PHAT) method. The reference is
Knapp, C., & Carter, G. C. (1976). The generalized correlation method for estimation of time delay.
- Parameters
x1 (nd-array) – The signal of the reference microphone
x2 (nd-array) – The signal of the second microphone
interp (int, optional (default 1)) – The interpolation value for the cross-correlation, it can improve the time resolution (and hence DOA resolution)
fs (int, optional (default 44100 Hz)) – The sampling frequency of the input signal
- Returns
theta (float) – the angle of arrival (in radian (I think))
pwr (float) – the magnitude of the maximum cross correlation coefficient
delay (float) – the delay between the two microphones (in seconds)
- pyroomacoustics.experimental.localization.tdoa_loc(R, tdoa, c, x0=None)¶
TDOA based localization
- Parameters
R (ndarray) – A 3xN array of 3D points
tdoa (ndarray) – A length N array of tdoa
c (float) – The speed of sound
Reference –
--------- –
Li (Steven) –
localization (TDOA) –
pyroomacoustics.experimental.measure_ir module¶
- pyroomacoustics.experimental.measure_ir.measure_ir(sweep_length=1.0, sweep_type='exponential', fs=48000, f_lo=0.0, f_hi=None, volume=0.9, pre_delay=0.0, post_delay=0.1, fade_in_out=0.0, dev_in=None, dev_out=None, channels_input_mapping=None, channels_output_mapping=None, ascending=False, deconvolution=True, plot=True)
Measures an impulse response by playing a sweep and recording it using the sounddevice package.
- Parameters
sweep_length (float, optional) – length of the sweep in seconds
sweep_type (SweepType, optional) – type of sweep to use linear or exponential (default)
fs (int, optional) – sampling frequency (default 48 kHz)
f_lo (float, optional) – lowest frequency in the sweep
f_hi (float, optional) – highest frequency in the sweep, can be a negative offset from fs/2
volume (float, optional) – multiply the sweep by this number before playing (default 0.9)
pre_delay (float, optional) – delay in second before playing sweep
post_delay (float, optional) – delay in second before stopping recording after playing the sweep
fade_in_out (float, optional) – length in seconds of the fade in and out of the sweep (default 0.)
dev_in (int, optional) – input device number
dev_out (int, optional) – output device number
channels_input_mapping (array_like, optional) – List of channel numbers (starting with 1) to record. If mapping is given, channels is silently ignored.
channels_output_mapping (array_like, optional) – List of channel numbers (starting with 1) where the columns of data shall be played back on. Must have the same length as number of channels in data (except if data is mono, in which case the signal is played back on all given output channels). Each channel number may only appear once in mapping.
ascending (bool, optional) – wether the sweep is from high to low (default) or low to high frequencies
deconvolution (bool, optional) – if True, apply deconvolution to the recorded signal to remove the sweep (default 0.)
plot (bool, optional) – plot the resulting signal
- Return type
Returns the impulse response if deconvolution == True and the recorded signal if not
pyroomacoustics.experimental.physics module¶
- pyroomacoustics.experimental.physics.calculate_speed_of_sound(t, h, p)¶
Compute the speed of sound as a function of temperature, humidity and pressure
- Parameters
t (temperature [Celsius]) –
h (relative humidity [%]) –
p (atmospheric pressure [kpa]) –
- Return type
Speed of sound in [m/s]
pyroomacoustics.experimental.point_cloud module¶
Contains PointCloud class.
Given a number of points and their relative distances, this class aims at reconstructing their relative coordinates.
- class pyroomacoustics.experimental.point_cloud.PointCloud(m=1, dim=3, diameter=0.0, X=None, labels=None, EDM=None)¶
Bases:
object
- EDM()¶
Computes the EDM corresponding to the marker set
- align(marker, axis)¶
Rotate the marker set around the given axis until it is aligned onto the given marker
- Parameters
marker (int or str) – the index or label of the marker onto which to align the set
axis (int) – the axis around which the rotation happens
- center(marker)¶
Translate the marker set so that the argument is the origin.
- classical_mds(D)¶
Classical multidimensional scaling
- Parameters
D (square 2D ndarray) – Euclidean Distance Matrix (matrix containing squared distances between points
- copy()¶
Return a deep copy of this marker set object
- correct(corr_dic)¶
correct a marker location by a given vector
- doa(receiver, source)¶
Computes the direction of arrival wrt a source and receiver
- flatten(ind)¶
Transform the set of points so that the subset of markers given as argument is as close as flat (wrt z-axis) as possible.
- Parameters
ind (list of bools) – Lists of marker indices that should be all in the same subspace
- fromEDM(D, labels=None, method='mds')¶
Compute the position of markers from their Euclidean Distance Matrix
- Parameters
D (square 2D ndarray) – Euclidean Distance Matrix (matrix containing squared distances between points
labels (list, optional) – A list of human friendly labels for the markers (e.g. ‘east’, ‘west’, etc)
method (str, optional) – The method to use * ‘mds’ for multidimensional scaling (default) * ‘tri’ for trilateration
- key2ind(ref)¶
Get the index location from a label
- normalize(refs=None)¶
Reposition points such that x0 is at origin, x1 lies on c-axis and x2 lies above x-axis, keeping the relative position to each other. The z-axis is defined according to right hand rule by default.
- Parameters
refs (list of 3 ints or str) – The index or label of three markers used to define (origin, x-axis, y-axis)
left_hand (bool, optional (default False)) – Normally the z-axis is defined using right-hand rule, this flag allows to override this behavior
- plot(axes=None, show_labels=True, **kwargs)¶
- trilateration(D)¶
Find the location of points based on their distance matrix using trilateration
- Parameters
D (square 2D ndarray) – Euclidean Distance Matrix (matrix containing squared distances between points
- trilateration_single_point(c, Dx, Dy)¶
Given x at origin (0,0) and y at (0,c) the distances from a point at unknown location Dx, Dy to x, y, respectively, finds the position of the point.
RT60 Measurement Routine¶
Automatically determines the reverberation time of an impulse response using the Schroeder method 1.
References
- 1
M. R. Schroeder, “New Method of Measuring Reverberation Time,” J. Acoust. Soc. Am., vol. 37, no. 3, pp. 409-412, Mar. 1968.
- pyroomacoustics.experimental.rt60.measure_rt60(h, fs=1, decay_db=60, plot=False, rt60_tgt=None)¶
Analyze the RT60 of an impulse response. Optionaly plots some useful information.
- Parameters
h (array_like) – The impulse response.
fs (float or int, optional) – The sampling frequency of h (default to 1, i.e., samples).
decay_db (float or int, optional) – The decay in decibels for which we actually estimate the time. Although we would like to estimate the RT60, it might not be practical. Instead, we measure the RT20 or RT30 and extrapolate to RT60.
plot (bool, optional) – If set to
True
, the power decay and different estimated values will be plotted (default False).rt60_tgt (float) – This parameter can be used to indicate a target RT60 to which we want to compare the estimated value.
pyroomacoustics.experimental.signals module¶
A few test signals like sweeps and stuff.
- pyroomacoustics.experimental.signals.exponential_sweep(T, fs, f_lo=0.0, f_hi=None, fade=None, ascending=False)¶
Exponential sine sweep
- Parameters
T (float) – length in seconds
fs – sampling frequency
f_lo (float) – lowest frequency in fraction of fs (default 0)
f_hi (float) – lowest frequency in fraction of fs (default 1)
fade (float, optional) – length of fade in and out in seconds (default 0)
ascending (bool, optional) –
- pyroomacoustics.experimental.signals.linear_sweep(T, fs, f_lo=0.0, f_hi=None, fade=None, ascending=False)¶
Linear sine sweep
- Parameters
T (float) – length in seconds
fs – sampling frequency
f_lo (float) – lowest frequency in fraction of fs (default 0)
f_hi (float) – lowest frequency in fraction of fs (default 1)
fade (float, optional) – length of fade in and out in seconds (default 0)
ascending (bool, optional) –
- pyroomacoustics.experimental.signals.window(signal, n_win)¶
window the signal at beginning and end with window of size n_win/2
Module contents¶
Experimental¶
A bunch of routines useful when doing measurements and experiments.
- pyroomacoustics.experimental.measure_ir(sweep_length=1.0, sweep_type='exponential', fs=48000, f_lo=0.0, f_hi=None, volume=0.9, pre_delay=0.0, post_delay=0.1, fade_in_out=0.0, dev_in=None, dev_out=None, channels_input_mapping=None, channels_output_mapping=None, ascending=False, deconvolution=True, plot=True)¶
Measures an impulse response by playing a sweep and recording it using the sounddevice package.
- Parameters
sweep_length (float, optional) – length of the sweep in seconds
sweep_type (SweepType, optional) – type of sweep to use linear or exponential (default)
fs (int, optional) – sampling frequency (default 48 kHz)
f_lo (float, optional) – lowest frequency in the sweep
f_hi (float, optional) – highest frequency in the sweep, can be a negative offset from fs/2
volume (float, optional) – multiply the sweep by this number before playing (default 0.9)
pre_delay (float, optional) – delay in second before playing sweep
post_delay (float, optional) – delay in second before stopping recording after playing the sweep
fade_in_out (float, optional) – length in seconds of the fade in and out of the sweep (default 0.)
dev_in (int, optional) – input device number
dev_out (int, optional) – output device number
channels_input_mapping (array_like, optional) – List of channel numbers (starting with 1) to record. If mapping is given, channels is silently ignored.
channels_output_mapping (array_like, optional) – List of channel numbers (starting with 1) where the columns of data shall be played back on. Must have the same length as number of channels in data (except if data is mono, in which case the signal is played back on all given output channels). Each channel number may only appear once in mapping.
ascending (bool, optional) – wether the sweep is from high to low (default) or low to high frequencies
deconvolution (bool, optional) – if True, apply deconvolution to the recorded signal to remove the sweep (default 0.)
plot (bool, optional) – plot the resulting signal
- Return type
Returns the impulse response if deconvolution == True and the recorded signal if not
Submodules¶
pyroomacoustics.acoustics module¶
- class pyroomacoustics.acoustics.OctaveBandsFactory(base_frequency=125.0, fs=16000, n_fft=512)¶
Bases:
object
A class to process uniformly all properties that are defined on octave bands.
Each property is stored for an octave band.
- base_freq¶
The center frequency of the first octave band
- Type
float
- fs¶
The target sampling frequency
- Type
float
- n_bands¶
The number of octave bands needed to cover from base_freq to fs / 2 (i.e. floor(log2(fs / base_freq)))
- Type
int
- bands¶
The list of bin boundaries for the octave bands
- Type
list of tuple
- centers¶
The list of band centers
- Parameters
base_frequency (float, optional) – The center frequency of the first octave band (default: 125 Hz)
fs (float, optional) – The sampling frequency used (default: 16000 Hz)
third_octave (bool, optional) – Use third octave bands if True (default: False)
- analysis(x, band=None)¶
Process a signal x through the filter bank
- Parameters
x (ndarray (n_samples)) – The input signal
- Returns
The input signal filters through all the bands
- Return type
ndarray (n_samples, n_bands)
- get_bw()¶
Returns the bandwidth of the bands
- pyroomacoustics.acoustics.bandpass_filterbank(bands, fs=1.0, order=8, output='sos')¶
Create a bank of Butterworth bandpass filters
- Parameters
bands (array_like, shape == (n, 2)) – The list of bands
[[flo1, fup1], [flo2, fup2], ...]
fs (float, optional) – Sampling frequency (default 1.)
order (int, optional) – The order of the IIR filters (default: 8)
output ({'ba', 'zpk', 'sos'}) – Type of output: numerator/denominator (‘ba’), pole-zero (‘zpk’), or second-order sections (‘sos’). Default is ‘ba’.
- Returns
b, a (ndarray, ndarray) – Numerator (b) and denominator (a) polynomials of the IIR filter. Only returned if output=’ba’.
z, p, k (ndarray, ndarray, float) – Zeros, poles, and system gain of the IIR filter transfer function. Only returned if output=’zpk’.
sos (ndarray) – Second-order sections representation of the IIR filter. Only returned if output==’sos’.
- pyroomacoustics.acoustics.bands_hz2s(bands_hz, Fs, N, transform='dft')¶
Converts bands given in Hertz to samples with respect to a given sampling frequency Fs and a transform size N an optional transform type is used to handle DCT case.
- pyroomacoustics.acoustics.binning(S, bands)¶
This function computes the sum of all columns of S in the subbands enumerated in bands
- pyroomacoustics.acoustics.critical_bands()¶
Compute the Critical bands as defined in the book: Psychoacoustics by Zwicker and Fastl. Table 6.1 p. 159
- pyroomacoustics.acoustics.inverse_sabine(rt60, room_dim, c=None)¶
Given the desired reverberation time (RT60, i.e. the time for the energy to drop by 60 dB), the dimensions of a rectangular room (shoebox), and sound speed, computes the energy absorption coefficient and maximum image source order needed. The speed of sound used is the package wide default (in
constants
).- Parameters
rt60 (float) – desired RT60 (time it takes to go from full amplitude to 60 db decay) in seconds
room_dim (list of floats) – list of length 2 or 3 of the room side lengths
c (float) – speed of sound
- Returns
absorption (float) – the energy absorption coefficient to be passed to room constructor
max_order (int) – the maximum image source order necessary to achieve the desired RT60
- pyroomacoustics.acoustics.invmelscale(b)¶
Converts from melscale to frequency in Hertz according to Huang-Acero-Hon (6.143)
- pyroomacoustics.acoustics.melfilterbank(M, N, fs=1, fl=0.0, fh=0.5)¶
Returns a filter bank of triangular filters spaced according to mel scale
We follow Huang-Acera-Hon 6.5.2
- Parameters
M ((int)) – The number of filters in the bank
N ((int)) – The length of the DFT
fs ((float) optional) – The sampling frequency (default 8000)
fl ((float)) – Lowest frequency in filter bank as a fraction of fs (default 0.)
fh ((float)) – Highest frequency in filter bank as a fraction of fs (default 0.5)
- Return type
An M times int(N/2)+1 ndarray that contains one filter per row
- pyroomacoustics.acoustics.melscale(f)¶
Converts f (in Hertz) to the melscale defined according to Huang-Acero-Hon (2.6)
- pyroomacoustics.acoustics.mfcc(x, L=128, hop=64, M=14, fs=8000, fl=0.0, fh=0.5)¶
Computes the Mel-Frequency Cepstrum Coefficients (MFCC) according to the description by Huang-Acera-Hon 6.5.2 (2001) The MFCC are features mimicing the human perception usually used for some learning task.
This function will first split the signal into frames, overlapping or not, and then compute the MFCC for each frame.
- Parameters
x ((nd-array)) – Input signal
L ((int)) – Frame size (default 128)
hop ((int)) – Number of samples to skip between two frames (default 64)
M ((int)) – Number of mel-frequency filters (default 14)
fs ((int)) – Sampling frequency (default 8000)
fl ((float)) – Lowest frequency in filter bank as a fraction of fs (default 0.)
fh ((float)) – Highest frequency in filter bank as a fraction of fs (default 0.5)
- Return type
The MFCC of the input signal
- pyroomacoustics.acoustics.octave_bands(fc=1000, third=False, start=0.0, n=8)¶
Create a bank of octave bands
- Parameters
fc (float, optional) – The center frequency
third (bool, optional) – Use third octave bands (default False)
start (float, optional) – Starting frequency for octave bands in Hz (default 0.)
n (int, optional) – Number of frequency bands (default 8)
- pyroomacoustics.acoustics.rt60_eyring(S, V, a, m, c)¶
This is the Eyring formula for estimation of the reverberation time.
- Parameters
S – the total surface of the room walls in m^2
V – the volume of the room in m^3
a (float) – the equivalent absorption coefficient
sum(a_w * S_w) / S
wherea_w
andS_w
are the absorption and surface of wallw
, respectively.m (float) – attenuation constant of air
c (float) – speed of sound in m/s
- Returns
The estimated reverberation time (RT60)
- Return type
float
- pyroomacoustics.acoustics.rt60_sabine(S, V, a, m, c)¶
This is the Eyring formula for estimation of the reverberation time.
- Parameters
S – the total surface of the room walls in m^2
V – the volume of the room in m^3
a (float) – the equivalent absorption coefficient
sum(a_w * S_w) / S
wherea_w
andS_w
are the absorption and surface of wallw
, respectively.m (float) – attenuation constant of air
c (float) – speed of sound in m/s
- Returns
The estimated reverberation time (RT60)
- Return type
float
pyroomacoustics.beamforming module¶
- class pyroomacoustics.beamforming.Beamformer(R, fs, N=1024, Lg=None, hop=None, zpf=0, zpb=0)¶
Bases:
MicrophoneArray
At some point, in some nice way, the design methods should also go here. Probably with generic arguments.
- Parameters
R (numpy.ndarray) – Mics positions
fs (int) – Sampling frequency
N (int, optional) – Length of FFT, i.e. number of FD beamforming weights, equally spaced. Defaults to 1024.
Lg (int, optional) – Length of time-domain filters. Default to N.
hop (int, optional) – Hop length for frequency domain processing. Default to N/2.
zpf (int, optional) – Front zero padding length for frequency domain processing. Default is 0.
zpb (int, optional) – Zero padding length for frequency domain processing. Default is 0.
- far_field_weights(phi)¶
This method computes weight for a far field at infinity phi: direction of beam
- filters_from_weights(non_causal=0.0)¶
Compute time-domain filters from frequency domain weights. :param non_causal: ratio of filter coefficients used for non-causal part :type non_causal: float, optional
- plot(sum_ir=False, FD=True)¶
- plot_beam_response()¶
- plot_response_from_point(x, legend=None)¶
- process(FD=False)¶
- rake_delay_and_sum_weights(source, interferer=None, R_n=None, attn=True, ff=False)¶
- rake_distortionless_filters(source, interferer, R_n, delay=0.03, epsilon=0.005)¶
Compute time-domain filters of a beamformer minimizing noise and interference while forcing a distortionless response towards the source.
- rake_max_sinr_filters(source, interferer, R_n, epsilon=0.005, delay=0.0)¶
Compute the time-domain filters of SINR maximizing beamformer.
- rake_max_sinr_weights(source, interferer=None, R_n=None, rcond=0.0, ff=False, attn=True)¶
This method computes a beamformer focusing on a number of specific sources and ignoring a number of interferers.
- Parameters
source (array_like) – source locations
interferer (array_like) – interferer locations
- rake_max_udr_filters(source, interferer=None, R_n=None, delay=0.03, epsilon=0.005)¶
Compute directly the time-domain filters maximizing the Useful-to-Detrimental Ratio (UDR). This beamformer is not practical. It maximizes the UDR ratio in the time domain directly without imposing flat response towards the source of interest. This results in severe distortion of the desired signal.
- Parameters
source (pyroomacoustics.SoundSource) – the desired source
interferer (pyroomacoustics.SoundSource, optional) – the interfering source
R_n (ndarray, optional) – the noise covariance matrix, it should be (M * Lg)x(M * Lg) where M is the number of sensors and Lg the filter length
delay (float, optional) – the signal delay introduced by the beamformer (default 0.03 s)
epsilon (float) –
- rake_max_udr_weights(source, interferer=None, R_n=None, ff=False, attn=True)¶
- rake_mvdr_filters(source, interferer, R_n, delay=0.03, epsilon=0.005)¶
Compute the time-domain filters of the minimum variance distortionless response beamformer.
- rake_one_forcing_filters(sources, interferers, R_n, epsilon=0.005)¶
Compute the time-domain filters of a beamformer with unit response towards multiple sources.
- rake_one_forcing_weights(source, interferer=None, R_n=None, ff=False, attn=True)¶
- rake_perceptual_filters(source, interferer=None, R_n=None, delay=0.03, d_relax=0.035, epsilon=0.005)¶
Compute directly the time-domain filters for a perceptually motivated beamformer. The beamformer minimizes noise and interference, but relaxes the response of the filter within the 30 ms following the delay.
- response(phi_list, frequency)¶
- response_from_point(x, frequency)¶
- snr(source, interferer, f, R_n=None, dB=False)¶
- steering_vector_2D(frequency, phi, dist, attn=False)¶
- steering_vector_2D_from_point(frequency, source, attn=True, ff=False)¶
Creates a steering vector for a particular frequency and source :param frequency: :param source: location in cartesian coordinates :param attn: include attenuation factor if True :param ff: uses far-field distance if true
- Returns
A 2x1 ndarray containing the steering vector.
- udr(source, interferer, f, R_n=None, dB=False)¶
- weights_from_filters()¶
- pyroomacoustics.beamforming.H(A, **kwargs)¶
Returns the conjugate (Hermitian) transpose of a matrix.
- class pyroomacoustics.beamforming.MicrophoneArray(R, fs, directivity=None)¶
Bases:
object
Microphone array class.
- property M¶
- append(locs)¶
Add some microphones to the array.
- Parameters
locs (numpy.ndarray (2 or 3, n_mics)) – Adds n_mics microphones to the array. The coordinates are passed as a numpy.ndarray with each column containing the coordinates of a microphone.
- record(signals, fs)¶
This simulates the recording of the signals by the microphones. In particular, if the microphones and the room simulation do not use the same sampling frequency, down/up-sampling is done here. :param signals: An ndarray with as many lines as there are microphones. :param fs: the sampling frequency of the signals.
- set_directivity(directivities)¶
This functions sets self.directivity as a list of directivities with n_mics entries, where n_mics is the number of microphones :param directivities: single directivity for all microphones or a list of directivities for each microphone
- to_wav(filename, mono=False, norm=False, bitdepth=<class 'float'>)¶
Save all the signals to wav files.
- Parameters
filename (str) – the name of the file
mono (bool, optional) – if true, records only the center channel floor(M / 2) (default False)
norm (bool, optional) – if true, normalize the signal to fit in the dynamic range (default False)
bitdepth (int, optional) – the format of output samples [np.int8/16/32/64 or np.float (default)]
- pyroomacoustics.beamforming.circular_2D_array(center, M, phi0, radius)¶
Create an array of uniformly spaced circular points in 2D.
- Parameters
center (array_like) – The center of the array
M (int) – The number of points
phi0 (float) – The counterclockwise rotation of the first element in the array (from the x-axis)
radius (float) – The radius of the array
- Returns
The array of points
- Return type
ndarray (2, M)
- pyroomacoustics.beamforming.circular_microphone_array_xyplane(center, M, phi0, radius, fs, directivity=None, ax=None)¶
Create a microphone array with directivities pointing outwards (if provided).
- Parameters
center (array_like) – The center of the microphone array. 2D or 3D.
M (int) – The number of microphones.
phi0 (float) – The counterclockwise rotation (in degrees) of the first element in the microphone array (from the x-axis).
radius (float) – The radius of the microphone array.
fs (int) – The sampling frequency.
directivity (Directivity object, optional.) – Directivity pattern for each microphone which will be re-oriented to face outward. If not provided, microphones are omnidirectional.
ax (axes object, optional) – Axes on which to plot microphone array with its directivities.
- Return type
MicrophoneArray object
- pyroomacoustics.beamforming.distance(x, y)¶
Computes the distance matrix E. E[i,j] = sqrt(sum((x[:,i]-y[:,j])**2)). x and y are DxN ndarray containing N D-dimensional vectors.
- pyroomacoustics.beamforming.fir_approximation_ls(weights, T, n1, n2)¶
- pyroomacoustics.beamforming.linear_2D_array(center, M, phi, d)¶
Creates an array of uniformly spaced linear points in 2D :param center: The center of the array :type center: array_like :param M: The number of points :type M: int :param phi: The counterclockwise rotation of the array (from the x-axis) :type phi: float :param d: The distance between neighboring points :type d: float
- Returns
The array of points
- Return type
ndarray (2, M)
- pyroomacoustics.beamforming.mdot(*args)¶
Left-to-right associative matrix multiplication of multiple 2D ndarrays.
- pyroomacoustics.beamforming.poisson_2D_array(center, M, d)¶
Create array of 2D positions drawn from Poisson process. :param center: The center of the array :type center: array_like :param M: The number of points in the first dimension :type M: int :param M: The number of points in the second dimension :type M: int :param phi: The counterclockwise rotation of the array (from the x-axis) :type phi: float :param d: The distance between neighboring points :type d: float
- Returns
The array of points
- Return type
ndarray (2, M * N)
- pyroomacoustics.beamforming.spiral_2D_array(center, M, radius=1.0, divi=3, angle=None)¶
Generate an array of points placed on a spiral :param center: location of the center of the array :type center: array_like :param M: number of microphones :type M: int :param radius: microphones are contained within a cirle of this radius (default 1) :type radius: float :param divi: number of rotations of the spiral (default 3) :type divi: int :param angle: the angle offset of the spiral (default random) :type angle: float
- Returns
The array of points
- Return type
ndarray (2, M * N)
- pyroomacoustics.beamforming.square_2D_array(center, M, N, phi, d)¶
Creates an array of uniformly spaced grid points in 2D :param center: The center of the array :type center: array_like :param M: The number of points in the first dimension :type M: int :param N: The number of points in the second dimension :type N: int :param phi: The counterclockwise rotation of the array (from the x-axis) :type phi: float :param d: The distance between neighboring points :type d: float
- Returns
The array of points
- Return type
ndarray (2, M * N)
- pyroomacoustics.beamforming.sumcols(A)¶
Sums the columns of a matrix (np.array). The output is a 2D np.array of dimensions M x 1.
- pyroomacoustics.beamforming.unit_vec2D(phi)¶
pyroomacoustics.build_rir module¶
pyroomacoustics.directivities module¶
- class pyroomacoustics.directivities.CardioidFamily(orientation, pattern_enum, gain=1.0)¶
Bases:
Directivity
Object for directivities coming from the cardioid family.
- Parameters
orientation (DirectionVector) – Indicates direction of the pattern.
pattern_enum (DirectivityPattern) – Desired pattern for the cardioid.
- property directivity_pattern¶
Name of cardioid directivity pattern.
- get_response(azimuth, colatitude=None, magnitude=False, frequency=None, degrees=True)¶
Get response for provided angles.
- Parameters
azimuth (array_like) – Azimuth in degrees
colatitude (array_like, optional) – Colatitude in degrees. Default is to be on XY plane.
magnitude (bool, optional) – Whether to return magnitude of response.
frequency (float, optional) – For which frequency to compute the response. Cardioid are frequency-independent so this value has no effect.
degrees (bool, optional) – Whether provided angles are in degrees.
- Returns
resp – Response at provided angles.
- Return type
ndarray
- plot_response(**kwargs)¶
- class pyroomacoustics.directivities.DirectionVector(azimuth, colatitude=None, degrees=True)¶
Bases:
object
Object for representing direction vectors in 3D, parameterized by an azimuth and colatitude angle.
- Parameters
azimuth (float) –
colatitude (float, optional) – Default to PI / 2, only XY plane.
degrees (bool) – Whether provided values are in degrees (True) or radians (False).
- get_azimuth(degrees=False)¶
- get_colatitude(degrees=False)¶
- property unit_vector¶
Direction vector in cartesian coordinates.
- class pyroomacoustics.directivities.Directivity(orientation)¶
Bases:
ABC
Abstract class for directivity patterns.
- get_azimuth(degrees=True)¶
- get_colatitude(degrees=True)¶
- abstract get_response(azimuth, colatitude=None, magnitude=False, frequency=None, degrees=True)¶
Get response for provided angles and frequency.
- set_orientation(orientation)¶
Set orientation of directivity pattern.
- Parameters
orientation (DirectionVector) – New direction for the directivity pattern.
- class pyroomacoustics.directivities.DirectivityPattern(value)¶
Bases:
Enum
Common cardioid patterns and their corresponding coefficient for the expression:
\[r = a (1 + \cos \theta),\]where \(a\) is the coefficient that determines the cardioid pattern and \(r\) yields the gain at angle \(\theta\).
- CARDIOID = 0.5¶
- FIGURE_EIGHT = 0¶
- HYPERCARDIOID = 0.25¶
- OMNI = 1.0¶
- SUBCARDIOID = 0.75¶
- pyroomacoustics.directivities.cardioid_func(x, direction, coef, gain=1.0, normalize=True, magnitude=False)¶
One-shot function for computing cardioid response.
- Parameters
x (array_like, shape (..., n_dim)) – Cartesian coordinates
direction (array_like, shape (n_dim)) – Direction vector, should be normalized.
coef (float) – Parameter for the cardioid function.
gain (float) – The gain.
normalize (bool) – Whether to normalize coordinates and direction vector.
magnitude (bool) – Whether to return magnitude, default is False.
- Returns
resp – Response at provided angles for the speficied cardioid function.
- Return type
ndarray
- pyroomacoustics.directivities.source_angle_shoebox(image_source_loc, wall_flips, mic_loc)¶
Determine outgoing angle for each image source for a ShoeBox configuration.
Implementation of the method described in the paper: https://www2.ak.tu-berlin.de/~akgroup/ak_pub/2018/000458.pdf
- Parameters
image_source_loc (array_like) – Locations of image sources.
wall_flips (array_like) – Number of x, y, z flips for each image source.
mic_loc (array_like) – Microphone location.
- Returns
azimuth (
ndarray
) – Azimith for each image source, in radianscolatitude (
ndarray
) – Colatitude for each image source, in radians.
pyroomacoustics.metrics module¶
- pyroomacoustics.metrics.itakura_saito(x1, x2, sigma2_n, stft_L=128, stft_hop=128)¶
- pyroomacoustics.metrics.median(x, alpha=None, axis=-1, keepdims=False)¶
Computes 95% confidence interval for the median.
- Parameters
x (array_like) – the data array
alpha (float, optional) – the confidence level of the interval, confidence intervals are only computed when this argument is provided
axis (int, optional) – the axis of the data on which to operate, by default the last axis
- Returns
This function returns
(m, [le, ue])
and the confidence interval is[m-le, m+ue]
.- Return type
tuple
(float, [float, float])
- pyroomacoustics.metrics.mse(x1, x2)¶
A short hand to compute the mean-squared error of two signals.
\[\begin{split}MSE = \\frac{1}{n}\sum_{i=0}^{n-1} (x_i - y_i)^2\end{split}\]- Parameters
x1 – (ndarray)
x2 – (ndarray)
- Returns
(float) The mean of the squared differences of x1 and x2.
- pyroomacoustics.metrics.pesq(ref_file, deg_files, Fs=8000, swap=False, wb=False, bin='./bin/pesq')¶
pesq_vals = pesq(ref_file, deg_files, sample_rate=None, bin=’./bin/pesq’): Computes the perceptual evaluation of speech quality (PESQ) metric of a degraded file with respect to a reference file. Uses the utility obtained from ITU P.862 http://www.itu.int/rec/T-REC-P.862-200511-I!Amd2/en
- Parameters
ref_file – The filename of the reference file.
deg_files – A list of degraded sound files names.
sample_rate – Sample rates of the sound files [8kHz or 16kHz, default 8kHz].
swap – Swap byte orders (whatever that does is not clear to me) [default: False].
wb – Use wideband algorithm [default: False].
bin – Location of pesq executable [default: ./bin/pesq].
- Returns
(ndarray size 2xN) ndarray containing Raw MOS and MOS LQO in rows 0 and 1, respectively, and has one column per degraded file name in deg_files.
- pyroomacoustics.metrics.snr(ref, deg)¶
- pyroomacoustics.metrics.sweeping_echo_measure(rir, fs, t_min=0, t_max=0.5, fb=400)¶
Measure of sweeping echo in RIR obtained from image-source method. A higher value indicates less sweeping echoes
For details see : De Sena et al. “On the modeling of rectangular geometries in room acoustic simulations”, IEEE TASLP, 2015
- Parameters
rir (RIR signal from ISM (mono).) –
fs (sampling frequency.) –
t_min (TYPE, optional) – Minimum time window. The default is 0.
t_max (TYPE, optional) – Maximum time window. The default is 0.5.
fb (TYPE, optional) – Mask bandwidth. The default is 400.
- Return type
sweeping spectrum flatness (ssf)
pyroomacoustics.multirate module¶
- pyroomacoustics.multirate.frac_delay(delta, N, w_max=0.9, C=4)¶
Compute optimal fractionnal delay filter according to
Design of Fractional Delay Filters Using Convex Optimization William Putnam and Julius Smith
- Parameters
delta – delay of filter in (fractionnal) samples
N – number of taps
w_max – Bandwidth of the filter (in fraction of pi) (default 0.9)
C – sets the number of constraints to C*N (default 4)
- pyroomacoustics.multirate.low_pass(numtaps, B, epsilon=0.1)¶
- pyroomacoustics.multirate.resample(x, p, q)¶
pyroomacoustics.parameters module¶
- This file defines the main physical constants of the system:
Speed of sound
Absorption of materials
Scattering coefficients
Air absorption
- class pyroomacoustics.parameters.Constants¶
Bases:
object
A class to provide easy access package wide to user settable constants.
- get(name)¶
- set(name, val)¶
- class pyroomacoustics.parameters.Material(energy_absorption, scattering=None)¶
Bases:
object
A class that describes the energy absorption and scattering properties of walls.
- energy_absorption¶
A dictionary containing keys
description
,coeffs
, andcenter_freqs
.- Type
dict
- scattering¶
A dictionary containing keys
description
,coeffs
, andcenter_freqs
.- Type
dict
- Parameters
energy_absorption (float, str, or dict) –
- float: The material created will be equally absorbing at all frequencies
(i.e. flat).
str: The absorption values will be obtained from the database.
- dict: A dictionary containing keys
description
,coeffs
, and center_freqs
.
- dict: A dictionary containing keys
scattering (float, str, or dict) –
- float: The material created will be equally scattering at all frequencies
(i.e. flat).
str: The scattering values will be obtained from the database.
- dict: A dictionary containing keys
description
,coeffs
, and center_freqs
.
- dict: A dictionary containing keys
- property absorption_coeffs¶
shorthand to the energy absorption coefficients
- classmethod all_flat(materials)¶
Checks if all materials in a list are frequency flat
- Parameters
materials (list or dict of Material objects) – The list of materials to check
- Return type
True
if all materials have a single parameter, elseFalse
- is_freq_flat()¶
Returns
True
if the material has flat characteristics over frequency,False
otherwise.
- resample(octave_bands)¶
resample at given octave bands
- property scattering_coeffs¶
shorthand to the scattering coefficients
- class pyroomacoustics.parameters.Physics(temperature=None, humidity=None)¶
Bases:
object
A Physics object allows to compute the room physical properties depending on temperature and humidity.
- Parameters
temperature (float, optional) – The room temperature
humidity (float in range (0, 100), optional) – The room relative humidity in %. Default is 0.
- classmethod from_speed(c)¶
Choose a temperature and humidity matching a desired speed of sound
- get_air_absorption()¶
- Returns
(air_absorption, center_freqs)
whereair_absorption
is a listcorresponding to the center frequencies in
center_freqs
- get_sound_speed()¶
- Return type
the speed of sound
- pyroomacoustics.parameters.calculate_speed_of_sound(t, h, p)¶
Compute the speed of sound as a function of temperature, humidity and pressure
- Parameters
t (float) – temperature [Celsius]
h (float) – relative humidity [%]
p (float) – atmospheric pressure [kpa]
- Return type
Speed of sound in [m/s]
- pyroomacoustics.parameters.make_materials(*args, **kwargs)¶
Helper method to conveniently create multiple materials.
Each positional and keyword argument should be a valid input for the Material class. Then, for each of the argument, a Material will be created by calling the constructor.
If at least one positional argument is provided, a list of Material objects constructed using the provided positional arguments is returned.
If at least one keyword argument is provided, a dict with keys corresponding to the keywords and containing Material objects constructed with the keyword values is returned.
If only positional arguments are provided, only the list is returned. If only keyword arguments are provided, only the dict is returned. If both are provided, both are returned. If no argument is provided, an empty list is returned.
1# energy absorption parameters 2floor_eabs = { 3 "description": "Example floor material", 4 "coeffs": [0.1, 0.2, 0.1, 0.1, 0.1, 0.05], 5 "center_freqs": [125, 250, 500, 1000, 2000, 4000], 6} 7 8# scattering parameters 9audience_scat = { 10 "description": "Theatre Audience", 11 "coeffs": [0.3, 0.5, 0.6, 0.6, 0.7, 0.7, 0.7] 12 "center_freqs": [125, 250, 500, 1000, 2000, 4000], 13} 14 15# create a list of materials 16my_mat_list = pra.make_materials((floor_eabs, audience_scat)) 17 18# create a dict of materials 19my_mat_dict = pra.make_materials(floor=(floor_abs, audience_scat))
pyroomacoustics.recognition module¶
- class pyroomacoustics.recognition.CircularGaussianEmission(nstates, odim=1, examples=None)¶
Bases:
object
- get_pdfs()¶
Return the pdf of all the emission probabilities
- prob_x_given_state(examples)¶
Recompute the probability of the observation given the state of the latent variables
- update_parameters(examples, gamma)¶
- class pyroomacoustics.recognition.GaussianEmission(nstates, odim=1, examples=None)¶
Bases:
object
- get_pdfs()¶
Return the pdf of all the emission probabilities
- prob_x_given_state(examples)¶
Recompute the probability of the observation given the state of the latent variables
- update_parameters(examples, gamma)¶
- class pyroomacoustics.recognition.HMM(nstates, emission, model='full', leftright_jump_max=3)¶
Bases:
object
Hidden Markov Model with Gaussian emissions
- K¶
Number of states in the model
- Type
int
- O¶
Number of dimensions of the Gaussian emission distribution
- Type
int
- A¶
KxK transition matrix of the Markov chain
- Type
ndarray
- pi¶
K dim vector of the initial probabilities of the Markov chain
- Type
ndarray
- emission¶
An instance of emission_class
- Type
- model¶
The model used for the chain, can be ‘full’ or ‘left-right’
- Type
string, optional
- leftright_jum_max¶
The number of non-zero upper diagonals in a ‘left-right’ model
- Type
int, optional
- backward(X, p_x_given_z, c)¶
The backward recursion for HMM as described in Bishop Ch. 13
- fit(examples, tol=0.1, max_iter=10, verbose=False)¶
Training of the HMM using the EM algorithm
- Parameters
examples ((list)) – A list of examples used to train the model. Each example is an array of feature vectors, each row is a feature vector, the sequence runs on axis 0
tol ((float)) – The training stops when the progress between to steps is less than this number (default 0.1)
max_iter ((int)) – Alternatively the algorithm stops when a maximum number of iterations is reached (default 10)
verbose (bool, optional) – When True, prints extra information about convergence
- forward(X, p_x_given_z)¶
The forward recursion for HMM as described in Bishop Ch. 13
- generate(N)¶
Generate a random sample of length N using the model
- loglikelihood(X)¶
Compute the log-likelihood of a sample vector using the sum-product algorithm
- update_parameters(examples, gamma, xhi)¶
Update the parameters of the Markov Chain
- viterbi()¶
pyroomacoustics.soundsource module¶
- class pyroomacoustics.soundsource.SoundSource(position, images=None, damping=None, generators=None, walls=None, orders=None, signal=None, delay=0, directivity=None)¶
Bases:
object
A class to represent sound sources.
This object represents a sound source in a room by a list containing the original source position as well as all the image sources, up to some maximum order.
It also keeps track of the sequence of generated images and the index of the walls (in the original room) that generated the reflection.
- add_signal(signal)¶
Sets signal attribute
- Parameters
signal (ndarray) – a N-length ndarray, representing a sequence of samples generated by the source.
- distance(ref_point)¶
- get_damping(max_order=None)¶
- get_images(max_order=None, max_distance=None, n_nearest=None, ref_point=None)¶
Keep this for compatibility Now replaced by the bracket operator and the setOrdering function.
- get_rir(mic, visibility, Fs, t0=0.0, t_max=None)¶
Compute the room impulse response between the source and the microphone whose position is given as an argument.
- Parameters
mic (ndarray) – microphone position
visibility (int32) – 1 if mic visibile from source, 0 else. Exact type is important for C extension
Fs (int) – sampling frequency
t0 (float) – time offset, defaults to 0
t_max (None) – max time, defaults to 1.05 times the propagation time from mic to source
- set_directivity(directivity)¶
Sets self.directivity as a list of directivities with 1 entry
- set_ordering(ordering, ref_point=None)¶
Set the order in which we retrieve images sources. Can be: ‘nearest’, ‘strongest’, ‘order’ Optional argument: ref_point
- wall_sequence(i)¶
Print the wall sequence for the image source indexed by i
- pyroomacoustics.soundsource.build_rir_matrix(mics, sources, Lg, Fs, epsilon=0.005, unit_damping=False)¶
A function to build the channel matrix for many sources and microphones
- Parameters
mics (ndarray) – a dim-by-M ndarray where each column is the position of a microphone
sources (list of pyroomacoustics.SoundSource) – list of sound sources for which we want to build the matrix
Lg (int) – the length of the beamforming filters
Fs (int) – the sampling frequency
epsilon (float, optional) – minimum decay of the sinc before truncation. Defaults to epsilon=5e-3
unit_damping (bool, optional) – determines if the wall damping parameters are used or not. Default to false.
- Returns
the function returns the RIR matrix H =
:: – ——————– | H_{11} H_{12} … | … | ——————–
where H_{ij} is channel matrix between microphone i and source j.
H is of type (M*Lg)x((Lg+Lh-1)*S) where Lh is the channel length (determined by epsilon),
and M, S are the number of microphones, sources, respectively.
pyroomacoustics.stft module¶
Class for real-time STFT analysis and processing.
pyroomacoustics.sync module¶
- pyroomacoustics.sync.correlate(x1, x2, interp=1, phat=False)¶
Compute the cross-correlation between x1 and x2
- Parameters
x1 (array_like) – The data arrays
x2 (array_like) – The data arrays
interp (int, optional) – The interpolation factor for the output array, default 1.
phat (bool, optional) – Apply the PHAT weighting (default False)
- Return type
The cross-correlation between the two arrays
- pyroomacoustics.sync.delay_estimation(x1, x2, L)¶
Estimate the delay between x1 and x2. L is the block length used for phat
- pyroomacoustics.sync.tdoa(signal, reference, interp=1, phat=False, fs=1, t_max=None)¶
Estimates the shift of array signal with respect to reference using generalized cross-correlation
- Parameters
signal (array_like) – The array whose tdoa is measured
reference (array_like) – The reference array
interp (int, optional) – The interpolation factor for the output array, default 1.
phat (bool, optional) – Apply the PHAT weighting (default False)
fs (int or float, optional) – The sampling frequency of the input arrays, default=1
- Return type
The estimated delay between the two arrays
- pyroomacoustics.sync.time_align(ref, deg, L=4096)¶
return a copy of deg time-aligned and of same-length as ref. L is the block length used for correlations.
pyroomacoustics.utilities module¶
- pyroomacoustics.utilities.all_combinations(lst1, lst2)¶
Return all combinations between two arrays.
- pyroomacoustics.utilities.angle_function(s1, v2)¶
Compute azimuth and colatitude angles in radians for a given set of points s1 and a singular point v2.
- Parameters
s1 (numpy array) – 3×N for a set of N 3-D points, 2×N for a set of N 2-D points.
v2 (numpy array) – 3×1 for a 3-D point, 2×1 for a 2-D point.
- Returns
2×N numpy array with azimuth and colatitude angles in radians.
- Return type
numpy array
- pyroomacoustics.utilities.autocorr(x, p, biased=True, method='numpy')¶
Compute the autocorrelation for real signal x up to lag p.
- Parameters
x (numpy array) – Real signal in time domain.
p (int) – Amount of lag. When solving for LPC coefficient, this is typically the LPC order.
biased (bool) – Whether to return biased autocorrelation (default) or unbiased. As there are fewer samples for larger lags, the biased estimate tends to have better statistical stability under noise.
method (‘numpy, ‘fft’, ‘time’, pra) – Method for computing the autocorrelation: in the frequency domain with fft or pra (np.fft.rfft is used so only real signals are supported), in the time domain with time, or with numpy’s built-in function np.correlate (default). For p < log2(len(x)), the time domain approach may be more efficient.
- Returns
Autocorrelation for x up to lag p.
- Return type
numpy array
- pyroomacoustics.utilities.clip(signal, high, low)¶
Clip a signal from above at high and from below at low.
- pyroomacoustics.utilities.compare_plot(signal1, signal2, Fs, fft_size=512, norm=False, equal=False, title1=None, title2=None)¶
- pyroomacoustics.utilities.convmtx(x, n)¶
Create a convolution matrix H for the vector x of size len(x) times n. Then, the result of np.dot(H,v) where v is a vector of length n is the same as np.convolve(x, v).
- pyroomacoustics.utilities.create_noisy_signal(signal_fp, snr, noise_fp=None, offset=None)¶
Create a noisy signal of a specified SNR. :param signal_fp: File path to clean input. :type signal_fp: string :param snr: SNR in dB. :type snr: float :param noise_fp: File path to noise. Default is to use randomly generated white noise. :type noise_fp: string :param offset: Offset in seconds before starting the signal. :type offset: float
- Returns
numpy array – Noisy signal with specified SNR, between [-1,1] and zero mean.
numpy array – Clean signal, untouched from WAV file.
numpy array – Added noise such that specified SNR is met.
int – Sampling rate in Hz.
- pyroomacoustics.utilities.dB(signal, power=False)¶
- pyroomacoustics.utilities.fractional_delay(t0)¶
Creates a fractional delay filter using a windowed sinc function. The length of the filter is fixed by the module wide constant frac_delay_length (default 81).
- Parameters
t0 (float) – The delay in fraction of sample. Typically between -1 and 1.
- Returns
A fractional delay filter with specified delay.
- Return type
numpy array
- pyroomacoustics.utilities.fractional_delay_filter_bank(delays)¶
Creates a fractional delay filter bank of windowed sinc filters
- Parameters
delays (1d narray) – The delays corresponding to each filter in fractional samples
- Returns
An ndarray where the ith row contains the fractional delay filter corresponding to the ith delay. The number of columns of the matrix is proportional to the maximum delay.
- Return type
numpy array
- pyroomacoustics.utilities.goertzel(x, k)¶
Goertzel algorithm to compute DFT coefficients
- pyroomacoustics.utilities.highpass(signal, Fs, fc=None, plot=False)¶
Filter out the really low frequencies, default is below 50Hz
- pyroomacoustics.utilities.levinson(r, b)¶
Solve a system of the form Rx=b where R is hermitian toeplitz matrix and b is any vector using the generalized Levinson recursion as described in M.H. Hayes, Statistical Signal Processing and Modelling, p. 268.
- Parameters
r – First column of R, toeplitz hermitian matrix.
b – The right-hand argument. If b is a matrix, the system is solved for every column vector in b.
- Returns
The solution of the linear system Rx = b.
- Return type
numpy array
- pyroomacoustics.utilities.low_pass_dirac(t0, alpha, Fs, N)¶
Creates a vector containing a lowpass Dirac of duration T sampled at Fs with delay t0 and attenuation alpha.
If t0 and alpha are 2D column vectors of the same size, then the function returns a matrix with each line corresponding to pair of t0/alpha values.
- pyroomacoustics.utilities.lpc(x, p, biased=True)¶
Compute p LPC coefficients for a speech segment x.
- Parameters
x (numpy array) – Real signal in time domain.
p (int) – Amount of lag. When solving for LPC coefficient, this is typically the LPC order.
biased (bool) – Whether to use biased autocorrelation (default) or unbiased. As there are fewer samples for larger lags, the biased estimate tends to have better statistical stability under noise.
- Returns
p LPC coefficients.
- Return type
numpy array
- pyroomacoustics.utilities.normalize(signal, bits=None)¶
normalize to be in a given range. The default is to normalize the maximum amplitude to be one. An optional argument allows to normalize the signal to be within the range of a given signed integer representation of bits.
- pyroomacoustics.utilities.normalize_pwr(sig1, sig2)¶
Normalize sig1 to have the same power as sig2.
- pyroomacoustics.utilities.prony(x, p, q)¶
Prony’s Method from Monson H. Hayes’ Statistical Signal Processing, p. 154
- Parameters
x – signal to model
p – order of denominator
q – order of numerator
- Returns
a – numerator coefficients
b – denominator coefficients
err (the squared error of approximation)
- pyroomacoustics.utilities.real_spectrum(signal, axis=-1, **kwargs)¶
- pyroomacoustics.utilities.requires_matplotlib(func)¶
- pyroomacoustics.utilities.rms(data)¶
Compute root mean square of input.
- Parameters
data (numpy array) – Real signal in time domain.
- Returns
Root mean square.
- Return type
float
- pyroomacoustics.utilities.shanks(x, p, q)¶
Shank’s Method from Monson H. Hayes’ Statistical Signal Processing, p. 154
- Parameters
x – signal to model
p – order of denominator
q – order of numerator
- Returns
a – numerator coefficients
b – denominator coefficients
err – the squared error of approximation
- pyroomacoustics.utilities.spectrum(signal, Fs, N)¶
- pyroomacoustics.utilities.time_dB(signal, Fs, bits=16)¶
Compute the signed dB amplitude of the oscillating signal normalized wrt the number of bits used for the signal.
- pyroomacoustics.utilities.to_16b(signal)¶
converts float 32 bit signal (-1 to 1) to a signed 16 bits representation No clipping in performed, you are responsible to ensure signal is within the correct interval.
- pyroomacoustics.utilities.to_float32(data)¶
Cast data (typically from WAV) to float32.
- Parameters
data (numpy array) – Real signal in time domain, typically obtained from WAV file.
- Returns
data as float32.
- Return type
numpy array
pyroomacoustics.windows module¶
Window Functions¶
This is a collection of many popular window functions used in signal processing.
A few options are provided to correctly construct the required window function.
The flag
keyword argument can take the following values.
asymmetric
This way, many of the functions will sum to one when their left part is added to their right part. This is useful for overlapped transforms such as the STFT.
symmetric
With this flag, the window is perfectly symmetric. This might be more suitable for analysis tasks.
mdct
Available for only some of the windows. The window is modified to satisfy the perfect reconstruction condition of the MDCT transform.
Often, we would like to get the full window function, but on some occasions, it is useful
to get only the left (or right) part. This can be indicated via the keyword argument
length
that can take values full
(default), left
, or right
.
- pyroomacoustics.windows.bart(N, flag='asymmetric', length='full')¶
The Bartlett window function
\[w[n] = 2 / (M-1) ((M-1)/2 - |n - (M-1)/2|) , n=0,\ldots,N-1\]- Parameters
N (int) – the window length
flag (string, optional) –
Possible values
asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
symmetric: the window is symmetric (\(M=N-1\))
mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
length (string, optional) –
Possible values
full: the full length window is computed
right: the right half of the window is computed
left: the left half of the window is computed
- pyroomacoustics.windows.bart_hann(N, flag='asymmetric', length='full')¶
The modified Bartlett–Hann window function
\[w[n] = 0.62 - 0.48|(n/M-0.5)| + 0.38 \cos(2\pi(n/M-0.5)), n=0,\ldots,N-1\]- Parameters
N (int) – the window length
flag (string, optional) –
Possible values
asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
symmetric: the window is symmetric (\(M=N-1\))
mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
length (string, optional) –
Possible values
full: the full length window is computed
right: the right half of the window is computed
left: the left half of the window is computed
- pyroomacoustics.windows.blackman(N, flag='asymmetric', length='full')¶
The Blackman window function
\[w[n] = 0.42 - 0.5\cos(2\pi n/(M-1)) + 0.08\cos(4\pi n/(M-1)), n = 0, \ldots, M-1\]- Parameters
N (int) – the window length
flag (string, optional) –
Possible values
asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
symmetric: the window is symmetric (\(M=N-1\))
mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
length (string, optional) –
Possible values
full: the full length window is computed
right: the right half of the window is computed
left: the left half of the window is computed
- pyroomacoustics.windows.blackman_harris(N, flag='asymmetric', length='full')¶
The Hann window function
\[w[n] = a_0 - a_1 \cos(2\pi n/M) + a_2 \cos(4\pi n/M) + a_3 \cos(6\pi n/M), n=0,\ldots,N-1\]- Parameters
N (int) – the window length
flag (string, optional) –
Possible values
asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
symmetric: the window is symmetric (\(M=N-1\))
length (string, optional) –
Possible values
full: the full length window is computed
right: the right half of the window is computed
left: the left half of the window is computed
- pyroomacoustics.windows.bohman(N, flag='asymmetric', length='full')¶
The Bohman window function
\[w[n] = (1-|x|) \cos(\pi |x|) + \pi / |x| \sin(\pi |x|), -1\leq x\leq 1\]- Parameters
N (int) – the window length
flag (string, optional) –
Possible values
asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
symmetric: the window is symmetric (\(M=N-1\))
mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
length (string, optional) –
Possible values
full: the full length window is computed
right: the right half of the window is computed
left: the left half of the window is computed
- pyroomacoustics.windows.cosine(N, flag='asymmetric', length='full')¶
The cosine window function
\[w[n] = \cos(\pi (n/M - 0.5))^2\]- Parameters
N (int) – the window length
flag (string, optional) –
Possible values
asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
symmetric: the window is symmetric (\(M=N-1\))
mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
length (string, optional) –
Possible values
full: the full length window is computed
right: the right half of the window is computed
left: the left half of the window is computed
- pyroomacoustics.windows.flattop(N, flag='asymmetric', length='full')¶
The flat top weighted window function
\[w[n] = a_0 - a_1 \cos(2\pi n/M) + a_2 \cos(4\pi n/M) + a_3 \cos(6\pi n/M) + a_4 \cos(8\pi n/M), n=0,\ldots,N-1\]where
\[a0 = 0.21557895 a1 = 0.41663158 a2 = 0.277263158 a3 = 0.083578947 a4 = 0.006947368\]- Parameters
N (int) – the window length
flag (string, optional) –
Possible values
asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
symmetric: the window is symmetric (\(M=N-1\))
length (string, optional) –
Possible values
full: the full length window is computed
right: the right half of the window is computed
left: the left half of the window is computed
- pyroomacoustics.windows.gaussian(N, std, flag='asymmetric', length='full')¶
The flat top weighted window function
\[w[n] = e^{ -\frac{1}{2}\left(\frac{n}{\sigma}\right)^2 }\]- Parameters
N (int) – the window length
std (float) – the standard deviation
flag (string, optional) –
Possible values
asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
symmetric: the window is symmetric (\(M=N-1\))
length (string, optional) –
Possible values
full: the full length window is computed
right: the right half of the window is computed
left: the left half of the window is computed
- pyroomacoustics.windows.hamming(N, flag='asymmetric', length='full')¶
The Hamming window function
\[w[n] = 0.54 - 0.46 \cos(2 \pi n / M), n=0,\ldots,N-1\]- Parameters
N (int) – the window length
flag (string, optional) –
Possible values
asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
symmetric: the window is symmetric (\(M=N-1\))
mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
length (string, optional) –
Possible values
full: the full length window is computed
right: the right half of the window is computed
left: the left half of the window is computed
- pyroomacoustics.windows.hann(N, flag='asymmetric', length='full')¶
The Hann window function
\[w[n] = 0.5 (1 - \cos(2 \pi n / M)), n=0,\ldots,N-1\]- Parameters
N (int) – the window length
flag (string, optional) –
Possible values
asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
symmetric: the window is symmetric (\(M=N-1\))
mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
length (string, optional) –
Possible values
full: the full length window is computed
right: the right half of the window is computed
left: the left half of the window is computed
- pyroomacoustics.windows.kaiser(N, beta, flag='asymmetric', length='full')¶
The Kaiser window function
\[w[n] = I_0\left( \beta \sqrt{1-\frac{4n^2}{(M-1)^2}} \right)/I_0(\beta)\]with
\[\quad -\frac{M-1}{2} \leq n \leq \frac{M-1}{2},\]where \(I_0\) is the modified zeroth-order Bessel function.
- Parameters
N (int) – the window length
beta (float) – Shape parameter, determines trade-off between main-lobe width and side lobe level. As beta gets large, the window narrows.
flag (string, optional) –
Possible values
asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
symmetric: the window is symmetric (\(M=N-1\))
mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
length (string, optional) –
Possible values
full: the full length window is computed
right: the right half of the window is computed
left: the left half of the window is computed
- pyroomacoustics.windows.rect(N)¶
The rectangular window
\[w[n] = 1, n=0,\ldots,N-1\]- Parameters
N (int) – the window length
- pyroomacoustics.windows.triang(N, flag='asymmetric', length='full')¶
The triangular window function
\[w[n] = 1 - | 2 n / M - 1 |, n=0,\ldots,N-1\]- Parameters
N (int) – the window length
flag (string, optional) –
Possible values
asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))
symmetric: the window is symmetric (\(M=N-1\))
mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))
length (string, optional) –
Possible values
full: the full length window is computed
right: the right half of the window is computed
left: the left half of the window is computed
Module contents¶
pyroomacoustics¶
- Provides
Room impulse simulations via the image source model
Simulation of sound propagation using STFT engine
Reference implementations of popular algorithms for
beamforming
direction of arrival
adaptive filtering
source separation
single channel denoising
etc
How to use the documentation¶
Documentation is available in two forms: docstrings provided with the code, and a loose standing reference guide, available from the pyroomacoustics readthedocs page.
We recommend exploring the docstrings using IPython, an advanced Python shell with TAB-completion and introspection capabilities. See below for further instructions.
The docstring examples assume that pyroomacoustics has been imported as pra:
>>> import pyroomacoustics as pra
Code snippets are indicated by three greater-than signs:
>>> x = 42
>>> x = x + 1
Use the built-in help
function to view a function’s docstring:
>>> help(pra.stft.STFT)
...
Available submodules¶
pyroomacoustics.acoustics
Acoustics and psychoacoustics routines, mel-scale, critcal bands, etc.
pyroomacoustics.beamforming
Microphone arrays and beamforming routines.
pyroomacoustics.directivities
Directivity pattern objects and routines.
pyroomacoustics.geometry
Core geometry routine for the image source model.
pyroomacoustics.metrics
Performance metrics like mean-squared error, median, Itakura-Saito, etc.
pyroomacoustics.multirate
Rate conversion routines.
pyroomacoustics.parameters
Global parameters, i.e. for physics constants.
pyroomacoustics.recognition
Hidden Markov Model and TIMIT database structure.
pyroomacoustics.room
Abstraction of room and image source model.
pyroomacoustics.soundsource
Abstraction for a sound source.
pyroomacoustics.stft
Deprecated Replaced by the methods in
pyroomacoustics.transform
pyroomacoustics.sync
A few routines to help synchronize signals.
pyroomacoustics.utilities
A bunch of routines to do various stuff.
pyroomacoustics.wall
Abstraction for walls of a room.
pyroomacoustics.windows
Tapering windows for spectral analysis.
Available subpackages¶
pyroomacoustics.adaptive
Adaptive filter algorithms
pyroomacoustics.bss
Blind source separation.
pyroomacoustics.datasets
Wrappers around a few popular speech datasets
pyroomacoustics.denoise
Single channel noise reduction methods
pyroomacoustics.doa
Direction of arrival finding algorithms
pyroomacoustics.phase
Phase-related processing
pyroomacoustics.transform
Block frequency domain processing tools
Utilities¶
- __version__
pyroomacoustics version string