Pyroomacoustics logo
https://travis-ci.org/LCAV/pyroomacoustics.svg?branch=pypi-release Documentation Status Test on mybinder Pyroomacoustics discord server

Summary

Pyroomacoustics is a software package aimed at the rapid development and testing of audio array processing algorithms. The content of the package can be divided into three main components:

  1. Intuitive Python object-oriented interface to quickly construct different simulation scenarios involving multiple sound sources and microphones in 2D and 3D rooms;

  2. Fast C++ implementation of the image source model and ray tracing for general polyhedral rooms to efficiently generate room impulse responses and simulate the propagation between sources and receivers;

  3. Reference implementations of popular algorithms for STFT, beamforming, direction finding, adaptive filtering, source separation, and single channel denoising.

Together, these components form a package with the potential to speed up the time to market of new algorithms by significantly reducing the implementation overhead in the performance evaluation step. Please refer to this notebook for a demonstration of the different components of this package.

Room Acoustics Simulation

Consider the following scenario.

Suppose, for example, you wanted to produce a radio crime drama, and it so happens that, according to the scriptwriter, the story line absolutely must culminate in a satanic mass that quickly degenerates into a violent shootout, all taking place right around the altar of the highly reverberant acoustic environment of Oxford’s Christ Church cathedral. To ensure that it sounds authentic, you asked the Dean of Christ Church for permission to record the final scene inside the cathedral, but somehow he fails to be convinced of the artistic merit of your production, and declines to give you permission. But recorded in a conventional studio, the scene sounds flat. So what do you do?

—Schnupp, Nelken, and King, Auditory Neuroscience, 2010

Faced with this difficult situation, pyroomacoustics can save the day by simulating the environment of the Christ Church cathedral!

At the core of the package is a room impulse response (RIR) generator based on the image source model that can handle

  • Convex and non-convex rooms

  • 2D/3D rooms

The core image source model and ray tracing modules are written in C++ for better performance.

The philosophy of the package is to abstract all necessary elements of an experiment using an object-oriented programming approach. Each of these elements is represented using a class and an experiment can be designed by combining these elements just as one would do in a real experiment.

Let’s imagine we want to simulate a delay-and-sum beamformer that uses a linear array with four microphones in a shoe box shaped room that contains only one source of sound. First, we create a room object, to which we add a microphone array object, and a sound source object. Then, the room object has methods to compute the RIR between source and receiver. The beamformer object then extends the microphone array class and has different methods to compute the weights, for example delay-and-sum weights. See the example below to get an idea of what the code looks like.

The Room class also allows one to process sound samples emitted by sources, effectively simulating the propagation of sound between sources and microphones. At the input of the microphones composing the beamformer, an STFT (short time Fourier transform) engine allows to quickly process the signals through the beamformer and evaluate the output.

Reference Implementations

In addition to its core image source model simulation, pyroomacoustics also contains a number of reference implementations of popular audio processing algorithms for

We use an object-oriented approach to abstract the details of specific algorithms, making them easy to compare. Each algorithm can be tuned through optional parameters. We have tried to pre-set values for the tuning parameters so that a run with the default values will in general produce reasonable results.

Datasets

In an effort to simplify the use of datasets, we provide a few wrappers that allow to quickly load and sort through some popular speech corpora. At the moment we support the following.

For more details, see the doc.

Quick Install

Install the package with pip:

pip install pyroomacoustics

A cookiecutter is available that generates a working simulation script for a few 2D/3D scenarios:

# if necessary install cookiecutter
pip install cookiecutter

# create the simulation script
cookiecutter gh:fakufaku/cookiecutter-pyroomacoustics-sim

# run the newly created script
python <chosen_script_name>.py

We have also provided a minimal Dockerfile example in order to install and run the package within a Docker container. Note that you should increase the memory of your containers to 4 GB. Less may also be sufficient, but this is necessary for building the C++ code extension. You can build the container with:

docker build -t pyroom_container .

And enter the container with:

docker run -it pyroom_container:latest /bin/bash

Dependencies

The minimal dependencies are:

numpy
scipy>=0.18.0
Cython
pybind11

where Cython is only needed to benefit from the compiled accelerated simulator. The simulator itself has a pure Python counterpart, so that this requirement could be ignored, but is much slower.

On top of that, some functionalities of the package depend on extra packages:

samplerate   # for resampling signals
matplotlib   # to create graphs and plots
sounddevice  # to play sound samples
mir_eval     # to evaluate performance of source separation in examples

The requirements.txt file lists all packages necessary to run all of the scripts in the examples folder.

This package is mainly developed under Python 3.6. The last supported version for Python 2.7 is 0.4.3.

Under Linux and Mac OS, the compiled accelerators require a valid compiler to be installed, typically this is GCC. When no compiler is present, the package will still install but default to the pure Python implementation which is much slower. On Windows, we provide pre-compiled Python Wheels for Python 3.5 and 3.6.

Example

Here is a quick example of how to create and visualize the response of a beamformer in a room.

import numpy as np
import matplotlib.pyplot as plt
import pyroomacoustics as pra

# Create a 4 by 6 metres shoe box room
room = pra.ShoeBox([4,6])

# Add a source somewhere in the room
room.add_source([2.5, 4.5])

# Create a linear array beamformer with 4 microphones
# with angle 0 degrees and inter mic distance 10 cm
R = pra.linear_2D_array([2, 1.5], 4, 0, 0.1)
room.add_microphone_array(pra.Beamformer(R, room.fs))

# Now compute the delay and sum weights for the beamformer
room.mic_array.rake_delay_and_sum_weights(room.sources[0][:1])

# plot the room and resulting beamformer
room.plot(freq=[1000, 2000, 4000, 8000], img_order=0)
plt.show()

More examples

A couple of detailed demos with illustrations are available.

A comprehensive set of examples covering most of the functionalities of the package can be found in the examples folder of the GitHub repository.

Authors

  • Robin Scheibler

  • Ivan Dokmanić

  • Sidney Barthe

  • Eric Bezzam

  • Hanjie Pan

How to contribute

If you would like to contribute, please clone the repository and send a pull request.

For more details, see our CONTRIBUTING page.

Academic publications

This package was developed to support academic publications. The package contains implementations for DOA algorithms and acoustic beamformers introduced in the following papers.

  • H. Pan, R. Scheibler, I. Dokmanic, E. Bezzam and M. Vetterli. FRIDA: FRI-based DOA estimation for arbitrary array layout, ICASSP 2017, New Orleans, USA, 2017.

  • I. Dokmanić, R. Scheibler and M. Vetterli. Raking the Cocktail Party, in IEEE Journal of Selected Topics in Signal Processing, vol. 9, num. 5, p. 825 - 836, 2015.

  • R. Scheibler, I. Dokmanić and M. Vetterli. Raking Echoes in the Time Domain, ICASSP 2015, Brisbane, Australia, 2015.

If you use this package in your own research, please cite our paper describing it.

R. Scheibler, E. Bezzam, I. Dokmanić, Pyroomacoustics: A Python package for audio room simulations and array processing algorithms, Proc. IEEE ICASSP, Calgary, CA, 2018.

License

Copyright (c) 2014-2021 EPFL-LCAV

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
of the Software, and to permit persons to whom the Software is furnished to do
so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Table of contents

Contributing

If you want to contribute to pyroomacoustics and make it better, your help is very welcome. Contributing is also a great way to learn more about the package itself.

Ways to contribute

  • File bug reports

  • Improvements to the documentation are always more than welcome. Keeping a good clean documentation is a challenging task and any help is appreciated.

  • Feature requests

  • If you implemented an extra DOA/adaptive filter/beamforming algorithm: that’s awesome! We’d love to add it to the package.

  • Suggestion of improvements to the code base are also welcome.

Coding style

We try to stick to PEP8 as much as possible. Variables, functions, modules and packages should be in lowercase with underscores. Class names in CamelCase.

We use Black to format the code and isort to sort the imports. The format will be automatically checked when doing a pull request so it is recommended to regularly run Black on the code. Please format your code as follows prior to commiting.

pip install black isort
black .
isort --profile black .

Documentation

Docstrings should follow the numpydoc style.

We recommend the following steps for generating the documentation:

  • Create a separate environment, e.g. with Anaconda, as such: conda create -n mkdocs37 python=3.7 sphinx numpydoc mock sphinx_rtd_theme sphinxcontrib-napoleon tabulate

  • Switch to the environment: source activate mkdocs37

  • Navigate to the docs folder and run: ./make_apidoc.sh

  • The materials database page is generated with the script ./make_materials_table.py

  • Build and view the documentation locally with: make html

  • Open in your browser: docs/_build/html/index.html

Develop Locally

It can be convenient to develop and run tests locally. In contrast to only using the package, you will then also need to compile the C++ extension for that. On Mac and Linux, GCC is required, while Visual C++ 14.0 is necessary for windows.

  1. Get the source code. Use recursive close so that Eigen (a sub-module of this repository) is also downloaded.

    git clone --recursive git@github.com:LCAV/pyroomacoustics.git
    

    Alternatively, you can clone without the –recursive flag and directly install the Eigen library. For macOS, you can find installation instruction here: https://stackoverflow.com/a/35658421. After installation you can create a symbolic link as such:

    ln -s PATH_TO_EIGEN pyroomacoustics/libroom_src/ext/eigen/Eigen
    
  2. Install a few pre-requisites

    pip install numpy Cython pybind11
    
  3. Compile locally

    python setup.py build_ext --inplace
    

    On recent Mac OS (Mojave), it is necessary in some cases to add a higher deployment target

    MACOSX_DEPLOYMENT_TARGET=10.9 python setup.py build_ext --inplace
    
  4. Update $PYTHONPATH so that python knows where to find the local package

    # Linux/Mac
    export PYTHONPATH=<path_to_pyroomacoustics>:$PYTHONPATH
    

    For windows, see this question on stackoverflow.

  5. Install the dependencies listed in requirements.txt

    pip install -r requirements.txt
    
  6. Now fire up python or ipython and check that the package can be imported

    import pyroomacoustics as pra
    

Unit Tests

As much as possible, for every new function added to the code base, add a short test script in pyroomacoustics/tests. The names of the script and the functions running the test should be prefixed by test_. The tests are started by running nosetests at the root of the package.

How to make a clean pull request

Look for a project’s contribution instructions. If there are any, follow them.

  • Create a personal fork of the project on Github.

  • Clone the fork on your local machine. Your remote repo on Github is called origin.

  • Add the original repository as a remote called upstream.

  • If you created your fork a while ago be sure to pull upstream changes into your local repository.

  • Create a new branch to work on! Branch from develop if it exists, else from master.

  • Implement/fix your feature, comment your code.

  • Follow the code style of the project, including indentation.

  • If the project has tests run them!

  • Write or adapt tests as needed.

  • Add or change the documentation as needed.

  • Squash your commits into a single commit with git’s interactive rebase. Create a new branch if necessary.

  • Push your branch to your fork on Github, the remote origin.

  • From your fork open a pull request in the correct branch. Target the project’s develop branch if there is one, else go for master!

  • If the maintainer requests further changes just push them to your branch. The PR will be updated automatically.

  • Once the pull request is approved and merged you can pull the changes from upstream to your local repo and delete your extra branch(es).

And last but not least: Always write your commit messages in the present tense. Your commit message should describe what the commit, when applied, does to the code – not what you did to the code.

How to deploy a new version to pypi

  1. git checkout pypi-release

  2. git merge master

  3. Change version number in pyroomacoustics/version.py to new version number vX.Y.Z

  4. Edit CHANGELOG.rst as follows

    • Add new title X.Y.Z_ - YEAR-MONTH-DAY under Unreleased, add “Nothing yet” in the unreleased section.

    • Edit appropriately the lists of links at the bottom of the file.

  5. git commit

  6. git tag vX.Y.Z

  7. git push origin vX.Y.Z

  8. git push

  9. git checkout master

  10. git merge pypi-release

  11. git push origin master

Reference

This guide is based on the nice template by @MarcDiethelm available under MIT License.

Changelog

All notable changes to pyroomacoustics will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

Unreleased

Added

  • New implementation of fast RIR builder function in the libroom C++ extentsion to replace the current cython code. Advantages are: 1) only one compiled extension, 2) multithreading support

  • New global parameter sinc_lut_granularity that controls the number of points used in the look-up table for the sinc interpolation. Accessible via parameters.constants.get.

  • New global parameter num_threads that controls the number of threads used in multi-threaded code (rir builder only at the moment). The number of threads can also be controlled via the environement variable PRA_NUM_THREADS

Changed

  • Removed the broken get_rir method of the class SoundSource

Bugfix

  • Fixes a bug when using randomized image source model with a 2D room (#315) by @hrosseel

0.7.3 - 2022-12-05

Bugfix

  • Fixes issue #293 due to the C++ method Room::next_wall_hit not handling 2D shoebox rooms, which cause a seg fault

0.7.2 - 2022-11-15

Added

  • Appveyor builds for compiled wheels for win32/win64 x86

Bugfix

  • Fixes missing import statement in room.plot for 3D rooms (PR #286)

  • On win64, bss.fastmnmf would fail due to some singular matrix. 1) protect solve with try/except and switch to pseudo-inverse if necessary, 2) change eps 1e-7 -> 1e-6

0.7.1 - 2022-11-11

Bugfix

  • Fixed pypi upload for windows wheels

0.7.0 - 2022-11-10

Added

  • Added the AnechoicRoom class.

  • Added FastMNMF2 (Fast Multichannel Nonnegative Matrix Factorization 2) to bss subpackage.

  • Randomized image source method for removing sweeping echoes in shoebox rooms.

  • Adds the cart2spher method in pyroomacoustics.doa.utils to convert from cartesian to spherical coordinates.

  • Example room_complex_wall_materials.py

  • CI for python 3.10

Changed

  • Cleans up the plot_rir function in Room so that the labels are neater. It also adds an extra option kind that can take values “ir”, “tf”, or “spec” to plot the impulse responses, transfer functions, or spectrograms of the RIR.

  • Refactored the implementation of FastMNMF.

  • Modified the document of __init__.py in doa subpackage.

  • End of Python 3.6 support.

  • Removed the deprecated realtime sub-module.

  • Removed the deprecated functions pyroomacoustics.transform.analysis, pyroomacoustics.transform.synthesis, pyroomacoustics.transform.compute_synthesis_window. They are replaced by the equivalent functions in pyroomacoustics.transform.stft sub-module.

  • The minimum required version of numpy was changed to 1.13.0 (use of np.linalg.multi_dot in doa sub-package see #271)

Bugfix

  • Fixed most warnings in the tests

  • Fixed bug in examples/adaptive_filter_stft_domain.py

0.6.0 - 2021-11-29

Added

  • New DOA method: MUSIC with pseudo-spectra normalization. Thanks @4bian! Normalizes MUSIC pseudo spectra before averaging across frequency axis.

Bugfix

  • Issue 235: fails when set_ray_tracing is called, but no mic_array is set

  • Issue 236: general ISM produces the wrong transmission coefficients

  • Removes an unncessery warning for some rooms when ray tracing is not needed

Misc

  • Unify code format by using Black

  • Add code linting in continuous integration

  • Drop CI support for python 3.5

0.5.0 - 2021-09-06

Added

  • Adds tracking of reflection order with respect to x/y/z axis in the shoebox image source model engine. The orders are available in source.orders_xyz after running the image source model

  • Support for microphone and source directivites for image source model. Source directivities just for shoebox room. Available directivities are frequency-independent (cardioid patterns), although the infrastructure is there for frequency-dependent directivities: frequency-dependent usage in Room.compute_rir and abstract Directivity class.

  • Examples scripts and notebook for directivities.

Bugfix

  • Fix wrong bracketing for negative values in is_inside (ShoeBox)

0.4.3 - 2021-02-18

Added

  • Support for Python 3.8 and 3.9

Bugfix

  • Fixes typo in a docstring

  • Update docs to better reflect actual function parameters

  • Fixes the computation of the cost function of SRP-PHAT doa algorithm (bug reported in #PR197)

Changed

  • Improve the computation of the auxiliary variables in AuxIVA and ILRMA. Unnecessary division operations are reduced.

0.4.2 - 2020-09-24

Bugfix

  • Fixes the Dockerfile so that we don’t have to install the build dependencies manually

  • Change the eps for geometry computations from 1e-4 to 1e-5 in libroom

Added

  • A specialized is_inside routine for ShoeBox rooms

0.4.1 - 2020-07-02

Bugfix

  • Issue #162 (crash with max_order>31 on windows), seems fixed by the new C++ simulator

  • Test for issue #162 added

  • Fix Binder link

  • Adds the pyproject.toml file in MANIFEST.in so that it gets picked up for packaging

Added

  • Minimal Dockerfile example.

0.4.0 - 2020-06-03

Improved Simulator with Ray Tracing

  • Ray Tracing in the libroom module. The function compute_rir() of the Room object in python can now be executed using a pure ray tracing approach or a hybrid (ISM + RT) approach. That’s why this function has now several default arguments to run ray tracing (number of rays, scattering coefficient, energy and time thresholds, microphone’s radius).

  • Bandpass filterbank construction in pyroomacoustics.acoustics.bandpass_filterbank

  • Acoustic properties of different materials in pyroomacoustics.materials

  • Scattering from the wall is handled via ray tracing method, scattering coefficients are provided in pyroomacoustics.materials.Material objects

  • Function inverse_sabine allows to compute the absorption and max_order to use with the image source model to achieve a given reverberation time

  • The method rt60_theory in pyroomacoustics.room.Room allows to compute the theoretical RT60 of the room according to Eyring or Sabine formula

  • The method measure_rt60 in pyroomacoustics.room.Room allows to measure the RT60 of the simulated RIRs

Changes in the Room Class

  • Deep refactor of Room class. The constructor arguments have changed

  • No more sigma2_awgn, noise is now handled in pyroomacoustics.Room.simulate method

  • The way absorption is handled has changed. The scalar variables absorption are deprecated in favor of pyroomacoustics.materials.Material

  • Complete refactor of libroom, the compiled extension module responsible for the room simulation, into C++. The bindings to python are now done using pybind11.

  • Removes the pure Python room simulator as it was really slow

  • pyroomacoustics.transform.analysis, pyroomacoustics.transform.synthesis, pyroomacoustics.transform.compute_synthesis_window, have been deprecated in favor of pyroomacoustics.transform.stft.analysis, pyroomacoustics.transform.stft.synthesis, pyroomacoustics.transform.stft.compute_synthesis_window.

  • pyroomacoustics.Room has a new method add that can be used to add either a SoundSource, or a MicrophoneArray object. Subsequent calls to the method will always add source/microphones. There exists also methods add_source and add_microphone that can be used to add source/microphone via coordinates. The method add_microphone_array can be used to add a MicrophoneArray object, or a 2D array containing the locations of several microphones in its columns. While the add_microphone_array method used to replace the existing array by the argument, the new behavior is to add in addition to other microphones already present.

Bugfix

  • From Issue #150, increase max iterations to check if point is inside room

  • Issues #117 #163, adds project file pyproject.toml so that pip can know which dependencies are necessary for setup

  • Fixed some bugs in the documentation

  • Fixed normalization part in FastMNMF

Added

  • Added room_isinside_max_iter in parameters.py

  • Default set to 20 rather than 5 as it was in pyroomacoustics.room.Room.isinside

  • Added Binder link in the README for online interactive demo

Changed

  • Changed while loop to iterate up to room_isinside_max_iter in pyroomacoustics.room.Room.isinside

  • Changed initialization of FastMNMF to accelerate convergence

  • Fixed bug in doa/tops (float -> integer division)

  • Added vectorised functions in MUSIC

  • Use the vectorised functions in _process of MUSIC

0.3.1 - 2019-11-06

Bugfix

  • Fixed a non-unicode character in pyroomacoustics.experimental.rt60 breaking the tests

0.3.0 - 2019-11-06

Added

  • The routine pyroomacoustics.experimental.measure_rt60 to automatically measure the reverberation time of impulse responses. This is useful for measured and simulated responses.

Bugfix

  • Fixed docstring and an argument of pyroomacoustics.bss.ilrma

0.2.0 - 2019-09-04

Added

  • Added FastMNMF (Fast Multichannel Nonnegative Matrix Factorization) to bss subpackage.

  • Griffin-Lim algorithm for phase reconstruction from STFT magnitude measurements.

Changed

  • Removed the supperfluous warnings in pyroomacoustics.transform.stft.

  • Add option in pyroomacoustics.room.Room.plot_rir to set pair of channels to plot; useful when there’s too many impulse responses.

  • Add some window functions in windows.py and rearranged it in alphabetical order

  • Fixed various warnings in tests.

  • Faster implementation of AuxIVA that also includes OverIVA (more mics than sources). It also comes with a slightly changed API, Laplace and time-varying Gauss statistical models, and two possible initialization schemes.

  • Faster implementation of ILRMA.

  • SparseAuxIVA has slightly changed API, f_contrast has been replaced by model keyword argument.

Bugfix

  • Set rcond=None in all calls to numpy.linalg.lstsq to remove a FutureWarning

  • Add a lower bound to activations in pyroomacoustics.bss.auxiva to avoid underflow and divide by zero.

  • Fixed a memory leak in the C engine for polyhedral room (issue #116).

  • Fixed problem caused by dependency of setup.py on Cython (Issue #117)

0.1.23 - 2019-04-17

Bugfix

  • Expose mu parameter for adaptive.subband_lms.SubbandLMS.

  • Add SSL context to download_uncompress and unit test; error for Python 2.7.

0.1.22 - 2019-04-11

Added

  • Added “precision” parameter to “stft” class to choose between ‘single’ (float32/complex64) or ‘double’ (float64/complex128) for processing precision.

  • Unified unit test file for frequency-domain souce separation methods.

  • New algorithm for blind source separation (BSS): Sparse Independent Vector Analysis (SparseAuxIVA).

Changed

  • Few README improvements

Bugfix

  • Remove np.squeeze in STFT as it caused errors when an axis that shouldn’t be squeezed was equal to 1.

  • Beamformer.process was using old (non-existent) STFT function. Changed to using one-shot function from transform module.

  • Fixed a bug in utilities.fractional_delay_filter_bank that would cause the function to crash on some inputs (issue #87).

0.1.21 - 2018-12-20

Added

  • Adds several options to pyroomacoustics.room.Room.simulate to finely control the SNR of the microphone signals and also return the microphone signals with individual sources, prior to mix (useful for BSS evaluation)

  • Add subspace denoising approach in pyroomacoustics.denoise.subspace.

  • Add iterative Wiener filtering approach for single channel denoising in pyroomacoustics.denoise.iterative_wiener.

Changed

  • Add build instructions for python 3.7 and wheels for Mac OS X in the continuous integration (Travis and Appveyor)

  • Limits imports of matplotlib to within plotting functions so that the matplotlib backend can still be changed, even after importing pyroomacoustics

  • Better Vectorization of the computations in pyroomacoustics.bss.auxiva

Bugfix

  • Corrects a bug that causes different behavior whether sources are provided to the constructor of Room or to the add_source method

  • Corrects a typo in pyroomacoustics.SoundSource.add_signal

  • Corrects a bug in the update of the demixing matrix in pyroomacoustics.bss.auxiva

  • Corrects invalid memory access in the pyroomacoustics.build_rir cython accelerator and adds a unit test that checks the cython code output is correct

  • Fix bad handling of 1D b vectors in `pyroomacoustics.levinson.

0.1.20 - 2018-10-04

Added

  • STFT tutorial and demo notebook.

  • New algorithm for blind source separation (BSS): Independent Low-Rank Matrix Analysis (ILRMA)

Changed

  • Matplotlib is not a hard requirement anymore. When matplotlib is not installed, only a warning is issued on plotting commands. This is useful to run pyroomacoustics on headless servers that might not have matplotlib installed

  • Removed dependencies on joblib and requests packages

  • Apply matplotlib.pyplot.tight_layout in pyroomacoustics.Room.plot_rir

Bugfix

  • Monaural signals are now properly handled in one-shot stft/istft

  • Corrected check of size of absorption coefficients list in Room.from_corners

0.1.19 - 2018-09-24

Added

  • Added noise reduction sub-package denoise with spectral subtraction class and example.

  • Renamed realtime to transform and added deprecation warning.

  • Added a cython function to efficiently compute the fractional delays in the room impulse response from time delays and attenuations

  • notebooks folder.

  • Demo IPython notebook (with WAV files) of several features of the package.

  • Wrapper for Google’s Speech Command Dataset and an example usage script in examples.

  • Lots of new features in the pyroomacoustics.realtime subpackage

    • The STFT class can now be used both for frame-by-frame processing or for bulk processing

    • The functionality will replace the methods pyroomacoustics.stft, pyroomacoustics.istft, pyroomacoustics.overlap_add, etc,

    • The new function pyroomacoustics.realtime.compute_synthesis_window computes the optimal synthesis window given an analysis window and the frame shift

    • Extensive tests for the pyroomacoustics.realtime module

    • Convenience functions pyroomacoustics.realtime.analysis and pyroomacoustics.realtime.synthesis with an interface similar to pyroomacoustics.stft and pyroomacoustics.istft (which are now deprecated and will disappear soon)

    • The ordering of axis in the output from bulk STFT is now (n_frames, n_frequencies, n_channels)

    • Support for Intel’s mkl_fft package

    • axis (along which to perform DFT) and bits parameters for DFT class.

Changed

  • Improved documentation and docstrings

  • Using now the built-in RIR generator in examples/doa_algorithms.py

  • Improved the download/uncompress function for large datasets

  • Dusted the code for plotting on the sphere in pyroomacoustics.doa.grid.GridSphere

Deprecation Notice

  • The methods pyroomacoustics.stft, pyroomacoustics.istft, pyroomacoustics.overlap_add, etc, are now deprecated and will be removed in the near future

0.1.18 - 2018-04-24

Added

  • Added AuxIVA (independent vector analysis) to bss subpackage.

  • Added BSS IVA example

Changed

  • Moved Trinicon blind source separation algorithm to bss subpackage.

Bugfix

  • Correct a bug that causes 1st order sources to be generated for max_order==0 in pure python code

0.1.17 - 2018-03-23

Bugfix

  • Fixed issue #22 on github. Added INCREF before returning Py_None in C extension.

0.1.16 - 2018-03-06

Added

  • Base classes for Dataset and Sample in pyroomacoustics.datasets

  • Methods to filter datasets according to the metadata of samples

  • Deprecation warning for the TimitCorpus interface

Changed

  • Add list of speakers and sentences from CMU ARCTIC

  • CMUArcticDatabase basedir is now the top directory where CMU_ARCTIC database should be saved. Not the directory above as it previously was.

  • Libroom C extension is now a proper package. It can be imported.

  • Libroom C extension now compiles on windows with python>=3.5.

0.1.15 - 2018-02-23

Bugfix

  • Added pyroomacoustics.datasets to list of sub-packages in setup.py

0.1.14 - 2018-02-20

Added

  • Changelog

  • CMU ARCTIC corpus wrapper in pyroomacoustics.datasets

Changed

  • Moved TIMIT corpus wrapper from pyroomacoustics.recognition module to sub-package pyroomacoustics.datasets.timit

Room Simulation

Room

The three main classes are pyroomacoustics.room.Room, pyroomacoustics.soundsource.SoundSource, and pyroomacoustics.beamforming.MicrophoneArray. On a high level, a simulation scenario is created by first defining a room to which a few sound sources and a microphone array are attached. The actual audio is attached to the source as raw audio samples.

Then, a simulation method is used to create artificial room impulse responses (RIR) between the sources and microphones. The current default method is the image source which considers the walls as perfect reflectors. An experimental hybrid simulator based on image source method (ISM) 1 and ray tracing (RT) 2, 3, is also available. Ray tracing better capture the later reflections and can also model effects such as scattering.

The microphone signals are then created by convolving audio samples associated to sources with the appropriate RIR. Since the simulation is done on discrete-time signals, a sampling frequency is specified for the room and the sources it contains. Microphones can optionally operate at a different sampling frequency; a rate conversion is done in this case.

Simulating a Shoebox Room with the Image Source Model

We will first walk through the steps to simulate a shoebox-shaped room in 3D. We use the ISM is to find all image sources up to a maximum specified order and room impulse responses (RIR) are generated from their positions.

The code for the full example can be found in examples/room_from_rt60.py.

Create the room

So-called shoebox rooms are pallelepipedic rooms with 4 or 6 walls (in 2D and 3D respectiely), all at right angles. They are defined by a single vector that contains the lengths of the walls. They have the advantage of being simple to define and very efficient to simulate. In the following example, we define a 9m x 7.5m x 3.5m room. In addition, we use Sabine’s formula to find the wall energy absorption and maximum order of the ISM required to achieve a desired reverberation time (RT60, i.e. the time it takes for the RIR to decays by 60 dB).

import pyroomacoustics as pra

# The desired reverberation time and dimensions of the room
rt60 = 0.5  # seconds
room_dim = [9, 7.5, 3.5]  # meters

# We invert Sabine's formula to obtain the parameters for the ISM simulator
e_absorption, max_order = pra.inverse_sabine(rt60, room_dim)

# Create the room
room = pra.ShoeBox(
    room_dim, fs=16000, materials=pra.Material(e_absorption), max_order=max_order
)

The second argument is the sampling frequency at which the RIR will be generated. Note that the default value of fs is 8 kHz. The third argument is the material of the wall, that itself takes the absorption as a parameter. The fourth and last argument is the maximum number of reflections allowed in the ISM.

Note

Note that Sabine’s formula is only an approximation and that the actually simulated RT60 may vary by quite a bit.

Warning

Until recently, rooms would take an absorption parameter that was actually not the energy absorption we use now. The absorption parameter is now deprecated and will be removed in the future.

Randomized Image Method

In highly symmetric shoebox rooms, the regularity of image sources’ positions leads to a monotonic convergence in the time arrival of far-field image pairs. This causes sweeping echoes. The randomized image method adds a small random displacement to the image source positions, so that they are no longer time-aligned, thus reducing sweeping echoes considerably.

To use the randomized method, set the flag use_rand_ism to True while creating a room. Additionally, the maximum displacement of the image sources can be chosen by setting the parameter max_rand_disp. The default value is 8cm. For a full example see examples/randomized_image_method.py

import pyroomacoustics as pra

# The desired reverberation time and dimensions of the room
rt60 = 0.5  # seconds
room_dim = [5, 5, 5]  # meters

# We invert Sabine's formula to obtain the parameters for the ISM simulator
e_absorption, max_order = pra.inverse_sabine(rt60, room_dim)

# Create the room
room = pra.ShoeBox(
    room_dim, fs=16000, materials=pra.Material(e_absorption), max_order=max_order,
    use_rand_ism = True, max_rand_disp = 0.05
)
Add sources and microphones

Sources are fairly straightforward to create. They take their location as single mandatory argument, and a signal and start time as optional arguments. Here we create a source located at [2.5, 3.73, 1.76] within the room, that will utter the content of the wav file speech.wav starting at 1.3 s into the simulation. The signal keyword argument to the add_source() method should be a one-dimensional numpy.ndarray containing the desired sound signal.

# import a mono wavfile as the source signal
# the sampling frequency should match that of the room
from scipy.io import wavfile
_, audio = wavfile.read('speech.wav')

# place the source in the room
room.add_source([2.5, 3.73, 1.76], signal=audio, delay=1.3)

The locations of the microphones in the array should be provided in a numpy nd-array of size (ndim, nmics), that is each column contains the coordinates of one microphone. Note that it can be different from that of the room, in which case resampling will occur. Here, we create an array with two microphones placed at [6.3, 4.87, 1.2] and [6.3, 4.93, 1.2].

# define the locations of the microphones
import numpy as np
mic_locs = np.c_[
    [6.3, 4.87, 1.2],  # mic 1
    [6.3, 4.93, 1.2],  # mic 2
]

# finally place the array in the room
room.add_microphone_array(mic_locs)

A number of routines exist to create regular array geometries in 2D.

Adding source or microphone directivity

The directivity pattern of a source or microphone can be conveniently set through the directivity keyword argument.

First, a pyroomacoustics.directivities.Directivity object needs to be created. As of Sep 6, 2021, only frequency-independent directivities from the cardioid family are supported, namely figure-eight, hypercardioid, cardioid, and subcardioid.

Below is how a pyroomacoustics.directivities.Directivity object can be created, for example a hypercardioid pattern pointing at an azimuth angle of 90 degrees and a colatitude angle of 15 degrees.

# create directivity object
from pyroomacoustics.directivities import (
    DirectivityPattern,
    DirectionVector,
    CardioidFamily,
)
dir_obj = CardioidFamily(
    orientation=DirectionVector(azimuth=90, colatitude=15, degrees=True),
    pattern_enum=DirectivityPattern.HYPERCARDIOID,
)

After creating a pyroomacoustics.directivities.Directivity object, it is straightforward to set the directivity of a source, microphone, or microphone array, namely by using the directivity keyword argument.

For example, to set a source’s directivity:

# place the source in the room
room.add_source(position=[2.5, 3.73, 1.76], directivity=dir_obj)

To set a single microphone’s directivity:

# place the microphone in the room
room.add_microphone(loc=[2.5, 5, 1.76], directivity=dir_obj)

The same directivity pattern can be used for all microphones in an array:

# place microphone array in the room
import numpy as np
mic_locs = np.c_[
    [6.3, 4.87, 1.2],  # mic 1
    [6.3, 4.93, 1.2],  # mic 2
]
room.add_microphone_array(mic_locs, directivity=dir_obj)

Or a different directivity can be used for each microphone by passing a list of pyroomacoustics.directivities.Directivity objects:

# place the microphone array in the room
room.add_microphone_array(mic_locs, directivity=[dir_1, dir_2])

Warning

As of Sep 6, 2021, setting directivity patterns for sources and microphone is only supported for the image source method (ISM). Moreover, source direcitivities are only supported for shoebox-shaped rooms.

Create the Room Impulse Response

At this point, the RIRs are simply created by invoking the ISM via image_source_model(). This function will generate all the images sources up to the order required and use them to generate the RIRs, which will be stored in the rir attribute of room. The attribute rir is a list of lists so that the outer list is on microphones and the inner list over sources.

room.compute_rir()

# plot the RIR between mic 1 and source 0
import matplotlib.pyplot as plt
plt.plot(room.rir[1][0])
plt.show()

Warning

The simulator uses a fractional delay filter that introduce a global delay in the RIR. The delay can be obtained as follows.

global_delay = pra.constants.get("frac_delay_length") // 2
Simulate sound propagation

By calling simulate(), a convolution of the signal of each source (if not None) will be performed with the corresponding room impulse response. The output from the convolutions will be summed up at the microphones. The result is stored in the signals attribute of room.mic_array with each row corresponding to one microphone.

room.simulate()

# plot signal at microphone 1
plt.plot(room.mic_array.signals[1,:])
Full Example

This example is partly exctracted from ./examples/room_from_rt60.py.

import numpy as np
import matplotlib.pyplot as plt
import pyroomacoustics as pra
from scipy.io import wavfile

# The desired reverberation time and dimensions of the room
rt60_tgt = 0.3  # seconds
room_dim = [10, 7.5, 3.5]  # meters

# import a mono wavfile as the source signal
# the sampling frequency should match that of the room
fs, audio = wavfile.read("examples/samples/guitar_16k.wav")

# We invert Sabine's formula to obtain the parameters for the ISM simulator
e_absorption, max_order = pra.inverse_sabine(rt60_tgt, room_dim)

# Create the room
room = pra.ShoeBox(
    room_dim, fs=fs, materials=pra.Material(e_absorption), max_order=max_order
)

# place the source in the room
room.add_source([2.5, 3.73, 1.76], signal=audio, delay=0.5)

# define the locations of the microphones
mic_locs = np.c_[
    [6.3, 4.87, 1.2], [6.3, 4.93, 1.2],  # mic 1  # mic 2
]

# finally place the array in the room
room.add_microphone_array(mic_locs)

# Run the simulation (this will also build the RIR automatically)
room.simulate()

room.mic_array.to_wav(
    f"examples/samples/guitar_16k_reverb_{args.method}.wav",
    norm=True,
    bitdepth=np.int16,
)

# measure the reverberation time
rt60 = room.measure_rt60()
print("The desired RT60 was {}".format(rt60_tgt))
print("The measured RT60 is {}".format(rt60[1, 0]))

# Create a plot
plt.figure()

# plot one of the RIR. both can also be plotted using room.plot_rir()
rir_1_0 = room.rir[1][0]
plt.subplot(2, 1, 1)
plt.plot(np.arange(len(rir_1_0)) / room.fs, rir_1_0)
plt.title("The RIR from source 0 to mic 1")
plt.xlabel("Time [s]")

# plot signal at microphone 1
plt.subplot(2, 1, 2)
plt.plot(room.mic_array.signals[1, :])
plt.title("Microphone 1 signal")
plt.xlabel("Time [s]")

plt.tight_layout()
plt.show()

Hybrid ISM/Ray Tracing Simulator

Warning

The hybrid simulator has not been thoroughly tested yet and should be used with care. The exact implementation and default settings may also change in the future. Currently, the default behavior of Room and ShoeBox has been kept as in previous versions of the package. Bugs and user experience can be reported on github.

The hybrid ISM/RT simulator uses ISM to simulate the early reflections in the RIR and RT for the diffuse tail. Our implementation is based on 2 and 3.

The simulator has the following features.

  • Scattering: Wall scattering can be defined by assigning a scattering coefficient to the walls together with the energy absorption.

  • Multi-band: The simulation can be carried out with different parameters for different octave bands. The octave bands go from 125 Hz to half the sampling frequency.

  • Air absorption: The frequency dependent absorption of the air can be turned by providing the keyword argument air_absorption=True to the room constructor.

Here is a simple example using the hybrid simulator. We suggest to use max_order=3 with the hybrid simulator.

# Create the room
room = pra.ShoeBox(
    room_dim,
    fs=16000,
    materials=pra.Material(e_absorption),
    max_order=3,
    ray_tracing=True,
    air_absorption=True,
)

# Activate the ray tracing
room.set_ray_tracing()

A few example programs are provided in ./examples.

  • ./examples/ray_tracing.py demonstrates use of ray tracing for rooms of different sizes and with different amounts of reverberation

  • ./examples/room_L_shape_3d_rt.py shows how to simulate a polyhedral room

  • ./examples/room_from_stl.py demonstrates how to import a model from an STL file

Wall Materials

The wall materials are handled by the Material objects. A material is defined by at least one absorption coefficient that represents the ratio of sound energy absorbed by a wall upon reflection. A material may have multiple absorption coefficients corresponding to different abosrptions at different octave bands. When only one coefficient is provided, the absorption is assumed to be uniform at all frequencies. In addition, materials may have one or more scattering coefficients corresponding to the ratio of energy scattered upon reflection.

The materials can be defined by providing the coefficients directly, or they can be defined by chosing a material from the materials database 2.

import pyroomacoustics as pra
m = pra.Material(energy_absorption="hard_surface")
room = pra.ShoeBox([9, 7.5, 3.5], fs=16000, materials=m, max_order=17)

We can use different materials for different walls. In this case, the materials should be provided in a dictionary. For a shoebox room, this can be done as follows. We use the make_materials() helper function to create a dict of Material objects.

import pyroomacoustics as pra
m = pra.make_materials(
    ceiling="hard_surface",
    floor="6mm_carpet",
    east="brickwork",
    west="brickwork",
    north="brickwork",
    south="brickwork",
)
room = pra.ShoeBox(
    [9, 7.5, 3.5], fs=16000, materials=m, max_order=17, air_absorption=True, ray_tracing=True
)

For a more complete example see examples/room_complex_wall_materials.py.

Note

For shoebox rooms, the walls are labelled as follows:

  • west/east for the walls in the y-z plane with a small/large x coordinates, respectively

  • south/north for the walls in the x-z plane with a small/large y coordinates, respectively

  • floor/ceiling for the walls int x-y plane with small/large z coordinates, respectively

Controlling the signal-to-noise ratio

It is in general necessary to scale the signals from different sources to obtain a specific signal-to-noise or signal-to-interference ratio (SNR and SIR, respectively). This can be done by passing some options to the simulate() function. Because the relative amplitude of signals will change at different microphones due to propagation, it is necessary to choose a reference microphone. By default, this will be the first microphone in the array (index 0). The simplest choice is to choose the variance of the noise \(\sigma_n^2\) to achieve a desired SNR with respect to the cumulative signal from all sources. Assuming that the signals from all sources are scaled to have the same amplitude (e.g., unit amplitude) at the reference microphone, the SNR is defined as

\[\mathsf{SNR} = 10 \log_{10} \frac{K}{\sigma_n^2}\]

where \(K\) is the number of sources. For example, an SNR of 10 decibels (dB) can be obtained using the following code

room.simulate(reference_mic=0, snr=10)

Sometimes, more challenging normalizations are necessary. In that case, a custom callback function can be provided to simulate. For example, we can imagine a scenario where we have n_src out of which n_tgt are the targets, the rest being interferers. We will assume all targets have unit variance, and all interferers have equal variance \(\sigma_i^2\) (at the reference microphone). In addition, there is uncorrelated noise \(\sigma_n^2\) at every microphones. We will define SNR and SIR with respect to a single target source:

\[ \begin{align}\begin{aligned}\mathsf{SNR} & = 10 \log_{10} \frac{1}{\sigma_n^2}\\\mathsf{SIR} & = 10 \log_{10} \frac{1}{(\mathsf{n_{src}} - \mathsf{n_{tgt}}) \sigma_i^2}\end{aligned}\end{align} \]

The callback function callback_mix takes as argument an nd-array premix_signals of shape (n_src, n_mics, n_samples) that contains the microphone signals prior to mixing. The signal propagated from the k-th source to the m-th microphone is contained in premix_signals[k,m,:]. It is possible to provide optional arguments to the callback via callback_mix_kwargs optional argument. Here is the code implementing the example described.

# the extra arguments are given in a dictionary
callback_mix_kwargs = {
        'snr' : 30,  # SNR target is 30 decibels
        'sir' : 10,  # SIR target is 10 decibels
        'n_src' : 6,
        'n_tgt' : 2,
        'ref_mic' : 0,
        }

def callback_mix(premix, snr=0, sir=0, ref_mic=0, n_src=None, n_tgt=None):

    # first normalize all separate recording to have unit power at microphone one
    p_mic_ref = np.std(premix[:,ref_mic,:], axis=1)
    premix /= p_mic_ref[:,None,None]

    # now compute the power of interference signal needed to achieve desired SIR
    sigma_i = np.sqrt(10 ** (- sir / 10) / (n_src - n_tgt))
    premix[n_tgt:n_src,:,:] *= sigma_i

    # compute noise variance
    sigma_n = np.sqrt(10 ** (- snr / 10))

    # Mix down the recorded signals
    mix = np.sum(premix[:n_src,:], axis=0) + sigma_n * np.random.randn(*premix.shape[1:])

    return mix

# Run the simulation
room.simulate(
        callback_mix=callback_mix,
        callback_mix_kwargs=callback_mix_kwargs,
        )
mics_signals = room.mic_array.signals

In addition, it is desirable in some cases to obtain the microphone signals with individual sources, prior to mixing. For example, this is useful to evaluate the output from blind source separation algorithms. In this case, the return_premix argument should be set to True

premix = room.simulate(return_premix=True)

Reverberation Time

The reverberation time (RT60) is defined as the time needed for the enery of the RIR to decrease by 60 dB. It is a useful measure of the amount of reverberation. We provide a method in the measure_rt60() to measure the RT60 of recorded or simulated RIR.

The method is also directly integrated in the Room object as the method measure_rt60().

# assuming the simulation has already been carried out
rt60 = room.measure_rt60()

for m in room.n_mics:
    for s in room.n_sources:
        print(
            "RT60 between the {}th mic and {}th source: {:.3f} s".format(m, s, rt60[m, s])
        )

Free-field simulation

You can also use this package to simulate free-field sound propagation between a set of sound sources and a set of microphones, without considering room effects. To this end, you can use the pyroomacoustics.room.AnechoicRoom class, which simply corresponds to setting the maximum image image order of the room simulation to zero. This allows for early development and testing of various audio-based algorithms, without worrying about room acoustics at first. Thanks to the modular framework of pyroomacoustics, room acoustics can easily be added, after this first testing stage, for more realistic simulations.

Use this if you can neglect room effects (e.g. you operate in an anechoic room or outdoors), or if you simply want to test your algorithm in the best-case scenario. The below code shows how to create and simualte an anechoic room. For a more involved example (testing a the DOA algorithm MUSIC in an anechoic room), see ./examples/doa_anechoic_room.py.

# Create anechoic room.
room = pra.AnechoicRoom(fs=16000)

# Place the microphone array around the origin.
mic_locs = np.c_[
    [0.1, 0.1, 0],
    [-0.1, 0.1, 0],
    [-0.1, -0.1, 0],
    [0.1, -0.1, 0],
]
room.add_microphone_array(mic_locs)

# Add a source. We use a white noise signal for the source, and
# the source can be arbitrarily far because there are no walls.
x = np.random.randn(2**10)
room.add_source([10.0, 20.0, -20], signal=x)

# run the simulation
room.simulate()

References

1
    1. Allen and D. A. Berkley, Image method for efficiently simulating small-room acoustics, J. Acoust. Soc. Am., vol. 65, no. 4, p. 943, 1979.

2(1,2,3)
  1. Vorlaender, Auralization, 1st ed. Berlin: Springer-Verlag, 2008, pp. 1-340.

3(1,2)
  1. Schroeder, Physically based real-time auralization of interactive virtual environments. PhD Thesis, RWTH Aachen University, 2011.

class pyroomacoustics.room.AnechoicRoom(dim=3, fs=8000, t0=0.0, sigma2_awgn=None, sources=None, mics=None, temperature=None, humidity=None, air_absorption=False)

Bases: ShoeBox

This class provides an API for creating an Anechoic “room” in 2D or 3D.

Parameters
  • dim (int) – Dimension of the room (2 or 3).

  • fs (int, optional) – The sampling frequency in Hz. Default is 8000.

  • t0 (float, optional) – The global starting time of the simulation in seconds. Default is 0.

  • sigma2_awgn (float, optional) – The variance of the additive white Gaussian noise added during simulation. By default, none is added.

  • sources (list of SoundSource objects, optional) – Sources to place in the room. Sources can be added after room creating with the add_source method by providing coordinates.

  • mics (MicrophoneArray object, optional) – The microphone array to place in the room. A single microphone or microphone array can be added after room creation with the add_microphone_array method.

  • temperature (float, optional) – The air temperature in the room in degree Celsius. By default, set so that speed of sound is 343 m/s.

  • humidity (float, optional) – The relative humidity of the air in the room (between 0 and 100). By default set to 0.

  • air_absorption (bool, optional) – If set to True, absorption of sound energy by the air will be simulated.

get_bbox()

Returns a bounding box for the mics and sources, for plotting.

is_inside(p)

Overloaded function to eliminate testing if objects are inside “room”.

plot(**kwargs)

Overloaded function to issue warning when img_order is given.

plot_walls(ax)

Overloaded function to eliminate wall plotting.

class pyroomacoustics.room.Room(walls, fs=8000, t0=0.0, max_order=1, sigma2_awgn=None, sources=None, mics=None, temperature=None, humidity=None, air_absorption=False, ray_tracing=False, use_rand_ism=False, max_rand_disp=0.08)

Bases: object

A Room object has as attributes a collection of pyroomacoustics.wall.Wall objects, a pyroomacoustics.beamforming.MicrophoneArray array, and a list of pyroomacoustics.soundsource.SoundSource. The room can be two dimensional (2D), in which case the walls are simply line segments. A factory method pyroomacoustics.room.Room.from_corners() can be used to create the room from a polygon. In three dimensions (3D), the walls are two dimensional polygons, namely a collection of points lying on a common plane. Creating rooms in 3D is more tedious and for convenience a method pyroomacoustics.room.Room.extrude() is provided to lift a 2D room into 3D space by adding vertical walls and parallel floor and ceiling.

The Room is sub-classed by pyroomacoustics.room.ShoeBox which creates a rectangular (2D) or parallelepipedic (3D) room. Such rooms benefit from an efficient algorithm for the image source method.

Attribute walls

(Wall array) list of walls forming the room

Attribute fs

(int) sampling frequency

Attribute max_order

(int) the maximum computed order for images

Attribute sources

(SoundSource array) list of sound sources

Attribute mics

(MicrophoneArray) array of microphones

Attribute corners

(numpy.ndarray 2xN or 3xN, N=number of walls) array containing a point belonging to each wall, used for calculations

Attribute absorption

(numpy.ndarray size N, N=number of walls) array containing the absorption factor for each wall, used for calculations

Attribute dim

(int) dimension of the room (2 or 3 meaning 2D or 3D)

Attribute wallsId

(int dictionary) stores the mapping “wall name -> wall id (in the array walls)”

Parameters
  • walls (list of Wall or Wall2D objects) – The walls forming the room.

  • fs (int, optional) – The sampling frequency in Hz. Default is 8000.

  • t0 (float, optional) – The global starting time of the simulation in seconds. Default is 0.

  • max_order (int, optional) – The maximum reflection order in the image source model. Default is 1, namely direct sound and first order reflections.

  • sigma2_awgn (float, optional) – The variance of the additive white Gaussian noise added during simulation. By default, none is added.

  • sources (list of SoundSource objects, optional) – Sources to place in the room. Sources can be added after room creating with the add_source method by providing coordinates.

  • mics (MicrophoneArray object, optional) – The microphone array to place in the room. A single microphone or microphone array can be added after room creation with the add_microphone_array method.

  • temperature (float, optional) – The air temperature in the room in degree Celsius. By default, set so that speed of sound is 343 m/s.

  • humidity (float, optional) – The relative humidity of the air in the room (between 0 and 100). By default set to 0.

  • air_absorption (bool, optional) – If set to True, absorption of sound energy by the air will be simulated.

  • ray_tracing (bool, optional) – If set to True, the ray tracing simulator will be used along with image source model.

  • use_rand_ism (bool, optional) – If set to True, image source positions will have a small random displacement to prevent sweeping echoes

  • max_rand_disp (float, optional;) – If using randomized image source method, what is the maximum displacement of the image sources?

add(obj)

Adds a sound source or microphone to a room

Parameters

obj (SoundSource or Microphone object) – The object to add

Returns

The room is returned for further tweaking.

Return type

Room

add_microphone(loc, fs=None, directivity=None)

Adds a single microphone in the room.

Parameters
  • loc (array_like or ndarray) – The location of the microphone. The length should be the same as the room dimension.

  • fs (float, optional) – The sampling frequency of the microphone, if different from that of the room.

Returns

The room is returned for further tweaking.

Return type

Room

add_microphone_array(mic_array, directivity=None)

Adds a microphone array (i.e. several microphones) in the room.

Parameters

mic_array (array_like or ndarray or MicrophoneArray object) –

The array can be provided as an array of size (dim, n_mics), where dim is the dimension of the room and n_mics is the number of microphones in the array.

As an alternative, a MicrophoneArray can be provided.

Returns

The room is returned for further tweaking.

Return type

Room

add_soundsource(sndsrc, directivity=None)

Adds a pyroomacoustics.soundsource.SoundSource object to the room.

Parameters

sndsrc (SoundSource object) – The SoundSource object to add to the room

add_source(position, signal=None, delay=0, directivity=None)

Adds a sound source given by its position in the room. Optionally a source signal and a delay can be provided.

Parameters
  • position (ndarray, shape: (2,) or (3,)) – The location of the source in the room

  • signal (ndarray, shape: (n_samples,), optional) – The signal played by the source

  • delay (float, optional) – A time delay until the source signal starts in the simulation

Returns

The room is returned for further tweaking.

Return type

Room

compute_rir()

Compute the room impulse response between every source and microphone.

direct_snr(x, source=0)

Computes the direct Signal-to-Noise Ratio

extrude(height, v_vec=None, absorption=None, materials=None)

Creates a 3D room by extruding a 2D polygon. The polygon is typically the floor of the room and will have z-coordinate zero. The ceiling

Parameters
  • height (float) – The extrusion height

  • v_vec (array-like 1D length 3, optional) – A unit vector. An orientation for the extrusion direction. The ceiling will be placed as a translation of the floor with respect to this vector (The default is [0,0,1]).

  • absorption (float or array-like, optional) – Absorption coefficients for all the walls. If a scalar, then all the walls will have the same absorption. If an array is given, it should have as many elements as there will be walls, that is the number of vertices of the polygon plus two. The two last elements are for the floor and the ceiling, respectively. It is recommended to use materials instead of absorption parameter. (Default: 1)

  • materials (dict) – Absorption coefficients for floor and ceiling. This parameter overrides absorption. (Default: {“floor”: 1, “ceiling”: 1})

classmethod from_corners(corners, absorption=None, fs=8000, t0=0.0, max_order=1, sigma2_awgn=None, sources=None, mics=None, materials=None, **kwargs)

Creates a 2D room by giving an array of corners.

Parameters
  • corners ((np.array dim 2xN, N>2)) – list of corners, must be antiClockwise oriented

  • absorption (float array or float) – list of absorption factor for each wall or single value for all walls (deprecated, use materials instead)

  • fs (int, optional) – The sampling frequency in Hz. Default is 8000.

  • t0 (float, optional) – The global starting time of the simulation in seconds. Default is 0.

  • max_order (int, optional) – The maximum reflection order in the image source model. Default is 1, namely direct sound and first order reflections.

  • sigma2_awgn (float, optional) – The variance of the additive white Gaussian noise added during simulation. By default, none is added.

  • sources (list of SoundSource objects, optional) – Sources to place in the room. Sources can be added after room creating with the add_source method by providing coordinates.

  • mics (MicrophoneArray object, optional) – The microphone array to place in the room. A single microphone or microphone array can be added after room creation with the add_microphone_array method.

  • kwargs (key, value mappings) – Other keyword arguments accepted by the Room class

Return type

Instance of a 2D room

get_bbox()

Returns a bounding box for the room

get_volume()

Computes the volume of the room

Returns

the volume of the room

Return type

float

get_wall_by_name(name)

Returns the instance of the wall by giving its name.

Parameters

name (string) – name of the wall

Returns

instance of the wall with this name

Return type

Wall

image_source_model()
is_inside(p, include_borders=True)

Checks if the given point is inside the room.

Parameters
  • p (array_like, length 2 or 3) – point to be tested

  • include_borders (bool, optional) – set true if a point on the wall must be considered inside the room

Return type

True if the given point is inside the room, False otherwise.

property is_multi_band
measure_rt60(decay_db=60, plot=False)

Measures the reverberation time (RT60) of the simulated RIR.

Parameters
  • decay_db (float) – This is the actual decay of the RIR used for the computation. The default is 60, meaning that the RT60 is exactly what we measure. In some cases, the signal may be too short to measure 60 dB decay. In this case, we can specify a lower value. For example, with 30 dB, the RT60 is twice the time measured.

  • plot (bool) – Displays a graph of the Schroeder curve and the estimated RT60.

Returns

An array that contains the measured RT60 for all the RIR.

Return type

ndarray (n_mics, n_sources)

property n_mics
property n_sources
plot(img_order=None, freq=None, figsize=None, no_axis=False, mic_marker_size=10, plot_directivity=True, ax=None, **kwargs)

Plots the room with its walls, microphones, sources and images

plot_rir(select=None, FD=False, kind=None)

Plot room impulse responses. Compute if not done already.

Parameters
  • select (list of tuples OR int) – List of RIR pairs (mic, src) to plot, e.g. [(0,0), (0,1)]. Or int to plot RIR from particular microphone to all sources. Note that microphones and sources are zero-indexed. Default is to plot all microphone-source pairs.

  • FD (bool, optional) – If True, the transfer function is plotted instead of the impulse response. Default is False.

  • kind (str, optional) – The value can be “ir”, “tf”, or “spec” which will plot impulse response, transfer function, and spectrogram, respectively. If this option is specified, then the value of FD is ignored. Default is “ir”.

Returns

  • fig (matplotlib figure) – Figure object for further modifications

  • axes (matplotlib list of axes objects) – Axes for further modifications

ray_tracing()
rt60_theory(formula='sabine')

Compute the theoretical reverberation time (RT60) for the room.

Parameters

formula (str) – The formula to use for the calculation, ‘sabine’ (default) or ‘eyring’

set_air_absorption(coefficients=None)

Activates or deactivates air absorption in the simulation.

Parameters

coefficients (list of float) – List of air absorption coefficients, one per octave band

set_ray_tracing(n_rays=None, receiver_radius=0.5, energy_thres=1e-07, time_thres=10.0, hist_bin_size=0.004)

Activates the ray tracer.

Parameters
  • n_rays (int, optional) – The number of rays to shoot in the simulation

  • receiver_radius (float, optional) – The radius of the sphere around the microphone in which to integrate the energy (default: 0.5 m)

  • energy_thres (float, optional) – The energy thresold at which rays are stopped (default: 1e-7)

  • time_thres (float, optional) – The maximum time of flight of rays (default: 10 s)

  • hist_bin_size (float) – The time granularity of bins in the energy histogram (default: 4 ms)

set_sound_speed(c)

Sets the speed of sound unconditionnaly

simulate(snr=None, reference_mic=0, callback_mix=None, callback_mix_kwargs={}, return_premix=False, recompute_rir=False)

Simulates the microphone signal at every microphone in the array

Parameters
  • reference_mic (int, optional) – The index of the reference microphone to use for SNR computations. The default reference microphone is the first one (index 0)

  • snr (float, optional) –

    The target signal-to-noise ratio (SNR) in decibels at the reference microphone. When this option is used the argument pyroomacoustics.room.Room.sigma2_awgn is ignored. The variance of every source at the reference microphone is normalized to one and the variance of the noise \(\sigma_n^2\) is chosen

    \[\mathsf{SNR} = 10 \log_{10} \frac{ K }{ \sigma_n^2 }\]

    The value of pyroomacoustics.room.Room.sigma2_awgn is also set to \(\sigma_n^2\) automatically

  • callback_mix (func, optional) – A function that will perform the mix, it takes as first argument an array of shape (n_sources, n_mics, n_samples) that contains the source signals convolved with the room impulse response prior to mixture at the microphone. It should return an array of shape (n_mics, n_samples) containing the mixed microphone signals. If such a function is provided, the snr option is ignored and pyroomacoustics.room.Room.sigma2_awgn is set to None.

  • callback_mix_kwargs (dict, optional) – A dictionary that contains optional arguments for callback_mix function

  • return_premix (bool, optional) – If set to True, the function will return an array of shape (n_sources, n_mics, n_samples) containing the microphone signals with individual sources, convolved with the room impulse response but prior to mixing

  • recompute_rir (bool, optional) – If set to True, the room impulse responses will be recomputed prior to simulation

Returns

Depends on the value of return_premix option

Return type

Nothing or an array of shape (n_sources, n_mics, n_samples)

unset_air_absorption()

Deactivates air absorption in the simulation

unset_ray_tracing()

Deactivates the ray tracer

property volume
wall_area(wall)

Computes the area of a 3D planar wall.

Parameters

wall (Wall instance) – the wall object that is defined in 3D space

class pyroomacoustics.room.ShoeBox(p, fs=8000, t0=0.0, absorption=None, max_order=1, sigma2_awgn=None, sources=None, mics=None, materials=None, temperature=None, humidity=None, air_absorption=False, ray_tracing=False, use_rand_ism=False, max_rand_disp=0.08)

Bases: Room

This class provides an API for creating a ShoeBox room in 2D or 3D.

Parameters
  • p (array_like) – Length 2 (width, length) or 3 (width, length, height) depending on the desired dimension of the room.

  • fs (int, optional) – The sampling frequency in Hz. Default is 8000.

  • t0 (float, optional) – The global starting time of the simulation in seconds. Default is 0.

  • absorption (float) – Average amplitude absorption of walls. Note that this parameter is deprecated; use materials instead!

  • max_order (int, optional) – The maximum reflection order in the image source model. Default is 1, namely direct sound and first order reflections.

  • sigma2_awgn (float, optional) – The variance of the additive white Gaussian noise added during simulation. By default, none is added.

  • sources (list of SoundSource objects, optional) – Sources to place in the room. Sources can be added after room creating with the add_source method by providing coordinates.

  • mics (MicrophoneArray object, optional) – The microphone array to place in the room. A single microphone or microphone array can be added after room creation with the add_microphone_array method.

  • materials (Material object or dict of Material objects) – See pyroomacoustics.parameters.Material. If providing a dict, you must provide a Material object for each wall: ‘east’, ‘west’, ‘north’, ‘south’, ‘ceiling’ (3D), ‘floor’ (3D).

  • temperature (float, optional) – The air temperature in the room in degree Celsius. By default, set so that speed of sound is 343 m/s.

  • humidity (float, optional) – The relative humidity of the air in the room (between 0 and 100). By default set to 0.

  • air_absorption (bool, optional) – If set to True, absorption of sound energy by the air will be simulated.

  • ray_tracing (bool, optional) – If set to True, the ray tracing simulator will be used along with image source model.

  • use_rand_ism (bool, optional) – If set to True, image source positions will have a small random displacement to prevent sweeping echoes

  • max_rand_disp (float, optional;) – If using randomized image source method, what is the maximum displacement of the image sources?

extrude(height)

Overload the extrude method from 3D rooms

get_volume()

Computes the volume of a room

Return type

the volume in cubic unit

is_inside(pos)
Parameters

pos (array_like) – The position to test in an array of size 2 for a 2D room and 3 for a 3D room

Return type

True if pos is a point in the room, False otherwise.

pyroomacoustics.room.find_non_convex_walls(walls)

Finds the walls that are not in the convex hull

Parameters

walls (list of Wall objects) – The walls that compose the room

Returns

The indices of the walls no in the convex hull

Return type

list of int

pyroomacoustics.room.sequence_generation(volume, duration, c, fs, max_rate=10000)
pyroomacoustics.room.wall_factory(corners, absorption, scattering, name='')

Call the correct method according to wall dimension

Materials Database

Absorption Coefficients

Massive constructions and hard surfaces

keyword

description

125 Hz

250 Hz

500 Hz

1 kHz

2 kHz

4 kHz

8 kHz

hard_surface

Walls, hard surfaces average (brick walls, plaster, hard floors, etc.)

0.02

0.02

0.03

0.03

0.04

0.05

0.05

brickwork

Walls, rendered brickwork

0.01

0.02

0.02

0.03

0.03

0.04

0.04

rough_concrete

Rough concrete

0.02

0.03

0.03

0.03

0.04

0.07

0.07

unpainted_concrete

Smooth unpainted concrete

0.01

0.01

0.02

0.02

0.02

0.05

0.05

rough_lime_wash

Rough lime wash

0.02

0.03

0.04

0.05

0.04

0.03

0.02

smooth_brickwork_flush_pointing

Smooth brickwork with flush pointing, painted

0.01

0.01

0.02

0.02

0.02

0.02

0.02

smooth_brickwork_10mm_pointing

Smooth brickwork, 10 mm deep pointing, pit sand mortar

0.08

0.09

0.12

0.16

0.22

0.24

0.24

brick_wall_rough

Brick wall, stuccoed with a rough finish

0.03

0.03

0.03

0.04

0.05

0.07

0.07

ceramic_tiles

Ceramic tiles with a smooth surface

0.01

0.01

0.01

0.02

0.02

0.02

0.02

limestone_wall

Limestone walls

0.02

0.02

0.03

0.04

0.05

0.05

0.05

reverb_chamber

Reverberation chamber walls

0.01

0.01

0.01

0.02

0.02

0.04

0.04

concrete_floor

Concrete floor

0.01

0.03

0.05

0.02

0.02

0.02

0.02

marble_floor

Marble floor

0.01

0.01

0.01

0.02

0.02

0.02

0.02

Lightweight constructions and linings

keyword

description

125 Hz

250 Hz

500 Hz

1 kHz

2 kHz

4 kHz

8 kHz

plasterboard

2 * 13 mm plasterboard on steel frame, 50 mm mineral wool in cavity, surface painted

0.15

0.1

0.06

0.04

0.04

0.05

0.05

wooden_lining

Wooden lining, 12 mm fixed on frame

0.27

0.23

0.22

0.15

0.1

0.07

0.06

Glazing

keyword

description

125 Hz

250 Hz

500 Hz

1 kHz

2 kHz

4 kHz

8 kHz

glass_3mm

Single pane of glass, 3 mm

0.08

0.04

0.03

0.03

0.02

0.02

0.02

glass_window

Glass window, 0.68 kg/m2

0.1

0.05

0.04

0.03

0.03

0.03

0.03

lead_glazing

Lead glazing

0.3

0.2

0.14

0.1

0.05

0.05

double_glazing_30mm

Double glazing, 2–3 mm glass, > 30 mm gap

0.15

0.05

0.03

0.03

0.02

0.02

0.02

double_glazing_10mm

Double glazing, 2–3 mm glass, 10 mm gap

0.1

0.07

0.05

0.03

0.02

0.02

0.02

double_glazing_inside

Double glazing, lead on the inside

0.15

0.3

0.18

0.1

0.05

0.05

Wood

keyword

description

125 Hz

250 Hz

500 Hz

1 kHz

2 kHz

4 kHz

8 kHz

wood_1.6cm

Wood, 1.6 cm thick, on 4 cm wooden planks

0.18

0.12

0.1

0.09

0.08

0.07

0.07

plywood_thin

Thin plywood panelling

0.42

0.21

0.1

0.08

0.06

0.06

wood_16mm

16 mm wood on 40 mm studs

0.18

0.12

0.1

0.09

0.08

0.07

0.07

audience_floor

Audience floor, 2 layers, 33 mm on sleepers over concrete

0.09

0.06

0.05

0.05

0.05

0.04

stage_floor

Wood, stage floor, 2 layers, 27 mm over airspace

0.1

0.07

0.06

0.06

0.06

0.06

wooden_door

Solid wooden door

0.14

0.1

0.06

0.08

0.1

0.1

0.10

Floor coverings

keyword

description

125 Hz

250 Hz

500 Hz

1 kHz

2 kHz

4 kHz

8 kHz

linoleum_on_concrete

Linoleum, asphalt, rubber, or cork tile on concrete

0.02

0.03

0.03

0.03

0.03

0.02

carpet_cotton

Cotton carpet

0.07

0.31

0.49

0.81

0.66

0.54

0.48

carpet_tufted_9.5mm

Loop pile tufted carpet, 1.4 kg/m2, 9.5 mm pile height: On hair pad, 3.0 kg/m2

0.1

0.4

0.62

0.7

0.63

0.88

carpet_thin

Thin carpet, cemented to concrete

0.02

0.04

0.08

0.2

0.35

0.4

carpet_6mm_closed_cell_foam

6 mm pile carpet bonded to closed-cell foam underlay

0.03

0.09

0.25

0.31

0.33

0.44

0.44

carpet_6mm_open_cell_foam

6 mm pile carpet bonded to open-cell foam underlay

0.03

0.09

0.2

0.54

0.7

0.72

0.72

carpet_tufted_9m

9 mm tufted pile carpet on felt underlay

0.08

0.08

0.3

0.6

0.75

0.8

0.80

felt_5mm

Needle felt 5 mm stuck to concrete

0.02

0.02

0.05

0.15

0.3

0.4

0.40

carpet_soft_10mm

10 mm soft carpet on concrete

0.09

0.08

0.21

0.26

0.27

0.37

carpet_hairy

Hairy carpet on 3 mm felt

0.11

0.14

0.37

0.43

0.27

0.25

0.25

carpet_rubber_5mm

5 mm rubber carpet on concrete

0.04

0.04

0.08

0.12

0.1

0.1

carpet_1.35_kg_m2

Carpet 1.35 kg/m2, on hair felt or foam rubber

0.08

0.24

0.57

0.69

0.71

0.73

cocos_fibre_roll_29mm

Cocos fibre roll felt, 29 mm thick (unstressed), reverse side clad with paper, 2.2 kg/m2, 2 Rayl

0.1

0.13

0.22

0.35

0.47

0.57

Curtains

keyword

description

125 Hz

250 Hz

500 Hz

1 kHz

2 kHz

4 kHz

8 kHz

curtains_cotton_0.5

Cotton curtains (0.5 kg/m2) draped to 3/4 area approx. 130 mm from wall

0.3

0.45

0.65

0.56

0.59

0.71

0.71

curtains_0.2

Curtains (0.2 kg/m2) hung 90 mm from wall

0.05

0.06

0.39

0.63

0.7

0.73

0.73

curtains_cotton_0.33

Cotton cloth (0.33 kg/m2) folded to 7/8 area

0.03

0.12

0.15

0.27

0.37

0.42

curtains_densely_woven

Densely woven window curtains 90 mm from wall

0.06

0.1

0.38

0.63

0.7

0.73

blinds_half_open

Vertical blinds, 15 cm from wall, half opened (45°)

0.03

0.09

0.24

0.46

0.79

0.76

blinds_open

Vertical blinds, 15 cm from wall, open (90°)

0.03

0.06

0.13

0.28

0.49

0.56

curtains_velvet

Tight velvet curtains

0.05

0.12

0.35

0.45

0.38

0.36

0.36

curtains_fabric

Curtain fabric, 15 cm from wall

0.1

0.38

0.63

0.52

0.55

0.65

curtains_fabric_folded

Curtain fabric, folded, 15 cm from wall

0.12

0.6

0.98

1

1

1

1.00

curtains_glass_mat

Curtains of close-woven glass mat hung 50 mm from wall

0.03

0.03

0.15

0.4

0.5

0.5

0.50

studio_curtains

Studio curtains, 22 cm from wall

0.36

0.26

0.51

0.45

0.62

0.76

Seating (2 seats per m2)

keyword

description

125 Hz

250 Hz

500 Hz

1 kHz

2 kHz

4 kHz

8 kHz

chairs_wooden

Wooden chairs without cushion

0.05

0.08

0.1

0.12

0.12

0.12

chairs_unoccupied_plastic

Unoccupied plastic chairs

0.06

0.1

0.1

0.2

0.3

0.2

0.20

chairs_medium_upholstered

Medium upholstered concert chairs, empty

0.49

0.66

0.8

0.88

0.82

0.7

chairs_heavy_upholstered

Heavily upholstered seats, unoccupied

0.7

0.76

0.81

0.84

0.84

0.81

chairs_upholstered_cloth

Empty chairs, upholstered with cloth cover

0.44

0.6

0.77

0.89

0.82

0.7

0.70

chairs_upholstered_leather

Empty chairs, upholstered with leather cover

0.4

0.5

0.58

0.61

0.58

0.5

0.50

chairs_upholstered_moderate

Unoccupied, moderately upholstered chairs (0.90 m × 0.55 m)

0.44

0.56

0.67

0.74

0.83

0.87

Audience (unless not specified explicitly, 2 persons per m2)

keyword

description

125 Hz

250 Hz

500 Hz

1 kHz

2 kHz

4 kHz

8 kHz

audience_orchestra_choir

Areas with audience, orchestra or choir including narrow aisles

0.6

0.74

0.88

0.96

0.93

0.85

0.85

audience_wooden_chairs_1_m2

Audience on wooden chairs, 1 per m2

0.16

0.24

0.56

0.69

0.81

0.78

0.78

audience_wooden_chairs_2_m2

Audience on wooden chairs, 2 per m2

0.24

0.4

0.78

0.98

0.96

0.87

0.87

orchestra_1.5_m2

Orchestra with instruments on podium, 1.5 m2 per person

0.27

0.53

0.67

0.93

0.87

0.8

0.80

audience_0.72_m2

Audience area, 0.72 persons / m2

0.1

0.21

0.41

0.65

0.75

0.71

audience_1_m2

Audience area, 1 person / m2

0.16

0.29

0.55

0.8

0.92

0.9

audience_1.5_m2

Audience area, 1.5 persons / m2

0.22

0.38

0.71

0.95

0.99

0.99

audience_2_m2

Audience area, 2 persons / m2

0.26

0.46

0.87

0.99

0.99

0.99

audience_upholstered_chairs_1

Audience in moderately upholstered chairs 0,85 m × 0,63 m

0.72

0.82

0.91

0.93

0.94

0.87

audience_upholstered_chairs_2

Audience in moderately upholstered chairs 0,90 m × 0,55 m

0.55

0.86

0.83

0.87

0.9

0.87

Wall absorbers

keyword

description

125 Hz

250 Hz

500 Hz

1 kHz

2 kHz

4 kHz

8 kHz

panel_fabric_covered_6pcf

Fabric-covered panel, 6 pcf rockwool core

0.46

0.93

1

1

1

1

1.00

panel_fabric_covered_8pcf

Fabric-covered panel, 8 pcf rockwool core

0.21

0.66

1

1

0.97

0.98

0.98

facing_brick

Facing-brick brickwork, open butt joins, brick dimensions 230 × 50 × 55 mm

0.04

0.14

0.49

0.35

0.31

0.36

acoustical_plaster_25mm

Acoustical plaster, approx. 25 mm thick, 3.5 kg/m2/cm

0.17

0.36

0.66

0.65

0.62

0.68

rockwool_50mm_80kgm3

Rockwool thickness = 50 mm, 80 kg/m3

0.22

0.6

0.92

0.9

0.88

0.88

0.88

rockwool_50mm_40kgm3

Rockwool thickness = 50 mm, 40 kg/m3

0.23

0.59

0.86

0.86

0.86

0.86

0.86

mineral_wool_50mm_40kgm3

50 mm mineral wool (40 kg/m3), glued to wall, untreated surface

0.15

0.7

0.6

0.6

0.85

0.9

0.90

mineral_wool_50mm_70kgm3

50 mm mineral wool (70 kg/m3) 300 mm in front of wall

0.7

0.45

0.65

0.6

0.75

0.65

0.65

gypsum_board

Gypsum board, perforation 19.6%, hole diameter 15 mm, backed by fibrous web 12 Rayl, 100 mm cavity filled with mineral fibre mat 1,05 kg/m2, 7,5 Rayl

0.3

0.69

1

0.81

0.66

0.62

perforated_veneered_chipboard

Perforated veneered chipboard, 50 mm, 1 mm holes, 3 mm spacing, 9% hole surface ratio, 150 mm cavity filled with 30 mm mineral wool

0.41

0.67

0.58

0.59

0.68

0.35

fibre_absorber_1

Fibre absorber, mineral fibre, 20 mm thick, 3.4 kg/m2, 50 mm cavity

0.2

0.56

0.82

0.87

0.7

0.53

fibre_absorber_2

Fibre absorber, mats of porous flexible fibrous web fabric, self-extinguishing

0.07

0.07

0.2

0.41

0.75

0.97

Ceiling absorbers

keyword

description

125 Hz

250 Hz

500 Hz

1 kHz

2 kHz

4 kHz

8 kHz

ceiling_plasterboard

Plasterboard ceiling on battens with large air-space above

0.2

0.15

0.1

0.08

0.04

0.02

ceiling_fibre_abosrber

Fibre absorber on perforated sheet metal cartridge, 0,5 mm zinc-plated steel, 1.5 mm hole diameter, 200 mm cavity filled with 20 mm mineral wool (20 kg/m3), inflammable

0.48

0.97

1

0.97

1

1

1.00

ceiling_fissured_tile

Fissured ceiling tile

0.49

0.53

0.53

0.75

0.92

0.99

ceiling_perforated_gypsum_board

Perforated 27 mm gypsum board (16%), d = 4,5 mm, 300 mm from ceiling

0.45

0.55

0.6

0.9

0.86

0.75

ceiling_melamine_foam

Wedge-shaped, melamine foam, ceiling tile

0.12

0.33

0.83

0.97

0.98

0.95

ceiling_metal_panel

Metal panel ceiling, backed by 20 mm Sillan acoustic tiles, panel width 85 mm, panel spacing 15 mm, cavity 35 cm

0.59

0.8

0.82

0.65

0.27

0.23

Special absorbers

keyword

description

125 Hz

250 Hz

500 Hz

1 kHz

2 kHz

4 kHz

8 kHz

microperforated_foil_kaefer

Microperforated foil “Microsorber” (Kaefer)

0.06

0.28

0.7

0.68

0.74

0.53

microperforated_glass_sheets

Microperforated glass sheets, 5 mm cavity

0.1

0.45

0.85

0.3

0.1

0.05

hanging_absorber_panels_1

Hanging absorber panels (foam), 400 mm depth, 400 mm distance

0.25

0.45

0.8

0.9

0.85

0.8

hanging_absorber_panels_2

Hanging absorber panels (foam), 400 mm depth, 700 mm distance

0.2

0.3

0.6

0.75

0.7

0.7

Scattering Coefficients

Diffusers

keyword

description

125 Hz

250 Hz

500 Hz

1 kHz

2 kHz

4 kHz

8 kHz

rpg_skyline

Diffuser RPG Skyline

0.01

0.08

0.45

0.82

1

rpg_qrd

Diffuser RPG QRD

0.06

0.15

0.45

0.95

0.88

0.91

Seating and audience

keyword

description

125 Hz

250 Hz

500 Hz

1 kHz

2 kHz

4 kHz

8 kHz

theatre_audience

Theatre Audience

0.3

0.5

0.6

0.6

0.7

0.70

0.70

classroom_tables

Rows of classroom tables and persons on chairs

0.2

0.3

0.4

0.5

0.5

0.60

0.60

amphitheatre_steps

Amphitheatre steps, length 82 cm, height 30 cm (Farnetani 2005)

0.05

0.45

0.75

0.9

0.9

Round Robin III – wall and ceiling

keyword

description

125 Hz

250 Hz

500 Hz

1 kHz

2 kHz

4 kHz

8 kHz

rect_prism_boxes

Rectangularandprismboxes (studio wall) “Round Robin III” (after (Bork 2005a))

0.5

0.9

0.95

0.95

0.95

0.95

trapezoidal_boxes

Trapezoidal boxes (studio ceiling) “Round Robin III” (after (Bork 2005a))

0.13

0.56

0.95

0.95

0.95

0.95

Transforms

Module contents

Block based processing

This subpackage contains routines for continuous realtime-like block processing of signals with the STFT. The routines are written to take advantage of fast FFT libraries like pyfftw or mkl when available.

DFT
A class for performing DFT of real signals
STFT
A class for continuous STFT processing and frequency domain filtering

Algorithms

DFT

Class for performing the Discrete Fourier Transform (DFT) and inverse DFT for real signals, including multichannel. It is also possible to specific an analysis or synthesis window.

When available, it is possible to use the pyfftw or mkl_fft packages. Otherwise the default is to use numpy.fft.rfft/numpy.fft.irfft.

More on pyfftw can be read here: https://pyfftw.readthedocs.io/en/latest/index.html

More on mkl_fft can be read here: https://github.com/IntelPython/mkl_fft

class pyroomacoustics.transform.dft.DFT(nfft, D=1, analysis_window=None, synthesis_window=None, transform='numpy', axis=0, precision='double', bits=None)

Bases: object

Class for performing the Discrete Fourier Transform (DFT) of real signals.

X

Real DFT computed by analysis.

Type

numpy array

x

IDFT computed by synthesis.

Type

numpy array

nfft

FFT size.

Type

int

D

Number of channels.

Type

int

transform

Which FFT package will be used.

Type

str

analysis_window

Window to be applied before DFT.

Type

numpy array

synthesis_window

Window to be applied after inverse DFT.

Type

numpy array

axis

Axis over which to compute the FFT.

Type

int

precision, bits

How many precision bits to use for the input. Twice the amount will be used for complex spectrum.

Type

string, np.float32, np.float64, np.complex64, np.complex128

Parameters
  • nfft (int) – FFT size.

  • D (int, optional) – Number of channels. Default is 1.

  • analysis_window (numpy array, optional) – Window to be applied before DFT. Default is no window.

  • synthesis_window (numpy array, optional) – Window to be applied after inverse DFT. Default is no window.

  • transform (str, optional) – which FFT package to use: numpy, pyfftw, or mkl. Default is numpy.

  • axis (int, optional) – Axis over which to compute the FFT. Default is first axis.

  • precision (string, np.float32, np.float64, np.complex64, np.complex128, optional) – How many precision bits to use for the input. If ‘single’/np.float32/np.complex64, 32 bits for real inputs or 64 for complex spectrum. Otherwise, cast to 64 bits for real inputs or 128 for complex spectrum (default).

analysis(x)

Perform frequency analysis of a real input using DFT.

Parameters

x (numpy array) – Real signal in time domain.

Returns

DFT of input.

Return type

numpy array

synthesis(X=None)

Perform time synthesis of frequency domain to real signal using the inverse DFT.

Parameters

X (numpy array, optional) – Complex signal in frequency domain. Default is to use DFT computed from analysis.

Returns

IDFT of self.X or input if given.

Return type

numpy array

STFT

Class for real-time STFT analysis and processing.

class pyroomacoustics.transform.stft.STFT(N, hop=None, analysis_window=None, synthesis_window=None, channels=1, transform='numpy', streaming=True, precision='double', **kwargs)

Bases: object

A class for STFT processing.

Parameters
  • N (int) – number of samples per frame

  • hop (int) – hop size

  • analysis_window (numpy array) – window applied to block before analysis

  • synthesis_window (numpy array) – window applied to the block before synthesis

  • channels (int) – number of signals

  • transform (str, optional) – which FFT package to use: ‘numpy’ (default), ‘pyfftw’, or ‘mkl’

  • streaming (bool, optional) – whether (True, default) or not (False) to “stitch” samples between repeated calls of ‘analysis’ and ‘synthesis’ if we are receiving a continuous stream of samples.

  • num_frames (int, optional) –

    Number of frames to be processed. If set, this will be strictly enforced as the STFT block will allocate memory accordingly. If not set, there will be no check on the number of frames sent to analysis/process/synthesis

    NOTE:

    1) num_frames = 0, corresponds to a “real-time” case in which each input block corresponds to [hop] samples. 2) num_frames > 0, requires [(num_frames-1)*hop + N] samples as the last frame must contain [N] samples.

  • precision (string, np.float32, np.float64, np.complex64, np.complex128, optional) – How many precision bits to use for the input. If ‘single’/np.float32/np.complex64, 32 bits for real inputs or 64 for complex spectrum. Otherwise, cast to 64 bits for real inputs or 128 for complex spectrum (default).

analysis(x)
Parameters

x (2D numpy array, [samples, channels]) – Time-domain signal.

process(X=None)
Parameters

X (numpy array) – X can take on multiple shapes: 1) (N,) if it is single channel and only one frame 2) (N,D) if it is multi-channel and only one frame 3) (F,N) if it is single channel but multiple frames 4) (F,N,D) if it is multi-channel and multiple frames

Returns

x_r – Reconstructed time-domain signal.

Return type

numpy array

reset()

Reset state variables. Necessary after changing or setting the filter or zero padding.

set_filter(coeff, zb=None, zf=None, freq=False)

Set time-domain FIR filter with appropriate zero-padding. Frequency spectrum of the filter is computed and set for the object. There is also a check for sufficient zero-padding.

Parameters
  • coeff (numpy array) – Filter in time domain.

  • zb (int) – Amount of zero-padding added to back/end of frame.

  • zf (int) – Amount of zero-padding added to front/beginning of frame.

  • freq (bool) – Whether or not given coefficients (coeff) are in the frequency domain.

synthesis(X=None)
Parameters

X (numpy array of frequency content) – X can take on multiple shapes: 1) (N,) if it is single channel and only one frame 2) (N,D) if it is multi-channel and only one frame 3) (F,N) if it is single channel but multiple frames 4) (F,N,D) if it is multi-channel and multiple frames where: - F is the number of frames - N is the number of frequency bins - D is the number of channels

Returns

x_r – Reconstructed time-domain signal.

Return type

numpy array

zero_pad_back(zb)

Set zero-padding at end of frame.

zero_pad_front(zf)

Set zero-padding at beginning of frame.

pyroomacoustics.transform.stft.analysis(x, L, hop, win=None, zp_back=0, zp_front=0)

Convenience function for one-shot STFT

Parameters
  • x (array_like, (n_samples) or (n_samples, n_channels)) – input signal

  • L (int) – frame size

  • hop (int) – shift size between frames

  • win (array_like) – the window to apply (default None)

  • zp_back (int) – zero padding to apply at the end of the frame

  • zp_front (int) – zero padding to apply at the beginning of the frame

Returns

X – The STFT of x

Return type

ndarray, (n_frames, n_frequencies) or (n_frames, n_frequencies, n_channels)

pyroomacoustics.transform.stft.compute_synthesis_window(analysis_window, hop)

Computes the optimal synthesis window given an analysis window and hop (frame shift). The procedure is described in

D. Griffin and J. Lim, Signal estimation from modified short-time Fourier transform, IEEE Trans. Acoustics, Speech, and Signal Process., vol. 32, no. 2, pp. 236-243, 1984.

Parameters
  • analysis_window (array_like) – The analysis window

  • hop (int) – The frame shift

pyroomacoustics.transform.stft.synthesis(X, L, hop, win=None, zp_back=0, zp_front=0)

Convenience function for one-shot inverse STFT

Parameters
  • X (array_like (n_frames, n_frequencies) or (n_frames, n_frequencies, n_channels)) – The data

  • L (int) – frame size

  • hop (int) – shift size between frames

  • win (array_like) – the window to apply (default None)

  • zp_back (int) – zero padding to apply at the end of the frame

  • zp_front (int) – zero padding to apply at the beginning of the frame

Returns

x – The inverse STFT of X

Return type

ndarray, (n_samples) or (n_samples, n_channels)

Dataset Wrappers

Module contents

The Datasets Sub-package is responsible to deliver wrappers around a few popular audio datasets to make them easier to use.

Two base class pyroomacoustics.datasets.base.Dataset and pyroomacoustics.datasets.base.Sample wrap together the audio samples and their meta data. The general idea is to create a sample object with an attribute containing all metadata. Dataset objects that have a collection of samples can then be created and can be filtered according to the values in the metadata.

Many of the functions with match or filter will take an arbitrary number of keyword arguments. The keys should match some metadata in the samples. Then there are three ways that match occurs between a key/value pair and an attribute sharing the same key.

  1. value == attribute

  2. value is a list and attribute in value == True

  3. value is a callable (a function) and value(attribute) == True

Example 1

# Prepare a few artificial samples
samples = [
    {
        'data' : 0.99,
        'metadata' : { 'speaker' : 'alice', 'sex' : 'female', 'age' : 37, 'number' : 'one' },
    },
    {
        'data' : 2.1,
        'metadata' : { 'speaker' : 'alice', 'sex' : 'female', 'age' : 37, 'number' : 'two' },
    },
    {
        'data' : 1.02,
        'metadata' : { 'speaker' : 'bob', 'sex' : 'male', 'age' : 48, 'number' : 'one' },
    },
    {
        'data' : 2.07,
        'metadata' : { 'speaker' : 'bob', 'sex' : 'male', 'age' : 48, 'number' : 'two' },
    },
    ]

corpus = Dataset()
for s in samples:
    new_sample = Sample(s['data'], **s['metadata'])
    corpus.add_sample(new_sample)

# Then, it possible to display summary info about the corpus
print(corpus)

# The number of samples in the corpus is given by ``len``
print('Number of samples:', len(corpus))

# And we can access samples with the slice operator
print('Sample #2:')
print(corpus[2])    # (shortcut for `corpus.samples[2]`)

# We can obtain a new corpus with only male subject
corpus_male_only = corpus.filter(sex='male')
print(corpus_male_only)

# Only retain speakers above 40 years old
corpus_older = corpus.filter(age=lambda a : a > 40)
print(corpus_older)

Example 2 (CMU ARCTIC)

# This example involves the CMU ARCTIC corpus available at
# http://www.festvox.org/cmu_arctic/

import matplotlib.pyplot as plt
import pyroomacoustics as pra

# Here, the corpus for speaker bdl is automatically downloaded
# if it is not available already
corpus = pra.datasets.CMUArcticCorpus(download=True, speaker=['bdl'])

# print dataset info and 10 sentences
print(corpus)
corpus.head(n=10)

# let's extract all samples containing the word 'what'
keyword = 'what'
matches = corpus.filter(text=lambda t : keyword in t)
print('The number of sentences containing "{}": {}'.format(keyword, len(matches)))
for s in matches.sentences:
    print('  *', s)

# if the sounddevice package is available, we can play the sample
matches[0].play()

# show the spectrogram
plt.figure()
matches[0].plot()
plt.show()

Example 3 (Google’s Speech Commands Dataset)

# This example involves Google's Speech Commands Dataset available at
# https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html

import matplotlib.pyplot as plt
import pyroomacoustics as pra

# The dataset is automatically downloaded if not available and 10 of each word is selected
dataset = pra.datasets.GoogleSpeechCommands(download=True, subset=10, seed=0)

# print dataset info, first 10 entries, and all sounds
print(dataset)
dataset.head(n=10)
print("All sounds in the dataset:")
print(dataset.classes)

# filter by specific word
selected_word = 'yes'
matches = dataset.filter(word=selected_word)
print("Number of '%s' samples : %d" % (selected_word, len(matches)))

# if the sounddevice package is available, we can play the sample
matches[0].play()

# show the spectrogram
plt.figure()
matches[0].plot()
plt.show()

Datasets Available

CMU ARCTIC Corpus

The CMU ARCTIC Dataset

The CMU_ARCTIC databases were constructed at the Language Technologies Institute at Carnegie Mellon University as phonetically balanced, US English single speaker databases designed for unit selection speech synthesis research. A detailed report on the structure and content of the database and the recording environment etc is available as a Carnegie Mellon University, Language Technologies Institute Tech Report CMU-LTI-03-177 and is also available here.

The databases consist of around 1150 utterances carefully selected from out-of-copyright texts from Project Gutenberg. The databses include US English male (bdl) and female (slt) speakers (both experinced voice talent) as well as other accented speakers.

The 1132 sentence prompt list is available from cmuarctic.data

The distributions include 16KHz waveform and simultaneous EGG signals. Full phoentically labelling was perfromed by the CMU Sphinx using the FestVox based labelling scripts. Complete runnable Festival Voices are included with the database distributions, as examples though better voices can be made by improving labelling etc.

License: Permissive, attribution required

Price: Free

URL: http://www.festvox.org/cmu_arctic/

class pyroomacoustics.datasets.cmu_arctic.CMUArcticCorpus(basedir=None, download=False, build=True, **kwargs)

Bases: Dataset

This class will load the CMU ARCTIC corpus in a structure amenable to be processed.

basedir

The directory where the CMU ARCTIC corpus is located/downloaded. By default, this is the current directory.

Type

str, option

info

A dictionary whose keys are the labels of metadata fields attached to the samples. The values are lists of all distinct values the field takes.

Type

dict

sentences

The list of all utterances in the corpus

Type

list of CMUArcticSentence

Parameters
  • basedir (str, optional) – The directory where the CMU ARCTIC corpus is located/downloaded. By default, this is the current directory.

  • download (bool, optional) – If the corpus does not exist, download it.

  • speaker (str or list of str, optional) – A list of the CMU ARCTIC speakers labels. If provided, only those speakers are loaded. By default, all speakers are loaded.

  • sex (str or list of str, optional) – Can be ‘female’ or ‘male’

  • lang (str or list of str, optional) – The language, only ‘English’ is available here

  • accent (str of list of str, optional) – The accent of the speaker

build_corpus(**kwargs)

Build the corpus with some filters (sex, lang, accent, sentence_tag, sentence)

filter(**kwargs)

Filter the corpus and selects samples that match the criterias provided The arguments to the keyword can be 1) a string, 2) a list of strings, 3) a function. There is a match if one of the following is True.

  1. value == attribute

  2. value is a list and attribute in value == True

  3. value is a callable (a function) and value(attribute) == True

class pyroomacoustics.datasets.cmu_arctic.CMUArcticSentence(path, **kwargs)

Bases: AudioSample

Create the sentence object

Parameters
  • path (str) – the path to the audio file

  • **kwargs – metadata as a list of keyword arguments

data

The actual audio signal

Type

array_like

fs

sampling frequency

Type

int

plot(**kwargs)

Plot the spectrogram

Google Speech Commands

Google’s Speech Commands Dataset

The Speech Commands Dataset has 65,000 one-second long utterances of 30 short words, by thousands of different people, contributed by members of the public through the AIY website. It’s released under a Creative Commons BY 4.0 license.

More info about the dataset can be found at the link below:

https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html

AIY website for contributing recordings:

https://aiyprojects.withgoogle.com/open_speech_recording

Tutorial on creating a word classifier:

https://www.tensorflow.org/versions/master/tutorials/audio_recognition

class pyroomacoustics.datasets.google_speech_commands.GoogleSample(path, **kwargs)

Bases: AudioSample

Create the sound object.

Parameters
  • path (str) – the path to the audio file

  • **kwargs – metadata as a list of keyword arguments

data

the actual audio signal

Type

array_like

fs

sampling frequency

Type

int

plot(**kwargs)

Plot the spectogram

class pyroomacoustics.datasets.google_speech_commands.GoogleSpeechCommands(basedir=None, download=False, build=True, subset=None, seed=0, **kwargs)

Bases: Dataset

This class will load the Google Speech Commands Dataset in a structure that is convenient to be processed.

basedir

The directory where the Speech Commands Dataset is located/downloaded.

Type

str

size_by_samples

A dictionary whose keys are the words in the dataset. The values are the number of occurances for that particular word.

Type

dict

subdirs

The list of subdirectories in basedir, where each sound type is the name of a subdirectory.

Type

list

classes

The list of all sounds, same as the keys of size_by_samples.

Type

list

Parameters
  • basedir (str, optional) – The directory where the Google Speech Commands dataset is located/downloaded. By default, this is the current directory.

  • download (bool, optional) – If the corpus does not exist, download it.

  • build (bool, optional) – Whether or not to build the dataset. By default, it is.

  • subset (int, optional) – Build a dataset that contains all noise samples and subset samples per word. By default, the dataset will be built with all samples.

  • seed (int, optional) – Which seed to use for the random generator when selecting a subset of samples. By default, seed=0.

build_corpus(subset=None, **kwargs)

Build the corpus with some filters (speech or not speech, sound type).

filter(**kwargs)

Filter the dataset and select samples that match the criterias provided The arguments to the keyword can be 1) a string, 2) a list of strings, 3) a function. There is a match if one of the following is True.

  1. value == attribute

  2. value is a list and attribute in value == True

  3. value is a callable (a function) and value(attribute) == True

TIMIT Corpus

The TIMIT Dataset

The TIMIT corpus of read speech is designed to provide speech data for acoustic-phonetic studies and for the development and evaluation of automatic speech recognition systems. TIMIT contains broadband recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences. The TIMIT corpus includes time-aligned orthographic, phonetic and word transcriptions as well as a 16-bit, 16kHz speech waveform file for each utterance. Corpus design was a joint effort among the Massachusetts Institute of Technology (MIT), SRI International (SRI) and Texas Instruments, Inc. (TI). The speech was recorded at TI, transcribed at MIT and verified and prepared for CD-ROM production by the National Institute of Standards and Technology (NIST).

The TIMIT corpus transcriptions have been hand verified. Test and training subsets, balanced for phonetic and dialectal coverage, are specified. Tabular computer-searchable information is included as well as written documentation.

Unfortunately, this is a proprietary dataset. A licensed can be obtained for $125 to $250 depending on your status (academic or otherwise).

Deprecation Warning: The interface of TimitCorpus will change in the near future to match that of pyroomacoustics.datasets.cmu_arctic.CMUArcticCorpus

URL: https://catalog.ldc.upenn.edu/ldc93s1

class pyroomacoustics.datasets.timit.Sentence(path)

Bases: object

Create the sentence object

Parameters

path ((string)) – the path to the particular sample

speaker

Speaker initials

Type

str

id

a digit to disambiguate identical initials

Type

str

sex

Speaker gender (M or F)

Type

str

dialect

Speaker dialect region number:

  1. New England

  2. Northern

  3. North Midland

  4. South Midland

  5. Southern

  6. New York City

  7. Western

  8. Army Brat (moved around)

Type

str

fs

sampling frequency

Type

int

samples

the audio track

Type

array_like (n_samples,)

text

the text of the sentence

Type

str

words

list of Word objects forming the sentence

Type

list

phonems

List of phonems contained in the sentence. Each element is a dictionnary containing a ‘bnd’ with the limits of the phonem and ‘name’ that is the phonem transcription.

Type

list

play()

Play the sound sample

plot(L=512, hop=128, zpb=0, phonems=False, **kwargs)
class pyroomacoustics.datasets.timit.TimitCorpus(basedir)

Bases: object

TimitCorpus class

Parameters
  • basedir ((string)) – The location of the TIMIT database

  • directories ((list of strings)) – The subdirectories containing the data ([‘TEST’,’TRAIN’])

  • sentence_corpus ((dict)) – A dictionnary that contains a list of Sentence objects for each sub-directory

  • word_corpus ((dict)) – A dictionnary that contains a list of Words objects for each sub-directory and word available in the corpus

build_corpus(sentences=None, dialect_region=None, speakers=None, sex=None)

Build the corpus

The TIMIT database structure is encoded in the directory sturcture:

basedir
TEST/TRAIN
Regional accent index (1 to 8)
Speakers (one directory per speaker)

Sentences (one file per sentence)

Parameters
  • sentences ((list)) – A list containing the sentences to which we want to restrict the corpus Example: sentences=[‘SA1’,’SA2’]

  • dialect_region ((list of int)) – A list to which we restrict the dialect regions Example: dialect_region=[1, 4, 5]

  • speakers ((list)) – A list of speakers acronym to which we want to restrict the corpus Example: speakers=[‘AKS0’]

  • sex ((string)) – Restrict to a single sex: ‘F’ for female, ‘M’ for male

get_word(d, w, index=0)

return instance index of word w from group (test or train) d

class pyroomacoustics.datasets.timit.Word(word, boundaries, data, fs, phonems=None)

Bases: object

A class used for words of the TIMIT corpus

word

The spelling of the word

Type

str

boundaries

The limits of the word within the sentence

Type

list

samples

A view on the sentence samples containing the word

Type

array_like

fs

The sampling frequency

Type

int

phonems

A list of phones contained in the word

Type

list

features

A feature array (e.g. MFCC coefficients)

Type

array_like

Parameters
  • word (str) – The spelling of the word

  • boundaries (list) – The limits of the word within the sentence

  • data (array_like) – The nd-array that contains all the samples of the sentence

  • fs (int) – The sampling frequency

  • phonems (list, optional) – A list of phones contained in the word

mfcc(frame_length=1024, hop=512)

compute the mel-frequency cepstrum coefficients of the word samples

play()

Play the sound sample

plot()

Tools and Helpers

Base Class

Base class for some data corpus and the samples it contains.

class pyroomacoustics.datasets.base.AudioSample(data, fs, **kwargs)

Bases: Sample

We add some methods specific to display and listen to audio samples. The sampling frequency of the samples is an extra parameter.

For multichannel audio, we assume the same format used by `scipy.io.wavfile <https://docs.scipy.org/doc/scipy-0.14.0/reference/io.html#module-scipy.io.wavfile>`_, that is data is then a 2D array with each column being a channel.

data

The actual data

Type

array_like

fs

The sampling frequency of the input signal

Type

int

meta

An object containing the sample metadata. They can be accessed using the dot operator

Type

pyroomacoustics.datasets.Meta

play(**kwargs)

Play the sound sample. This function uses the sounddevice package for playback.

It takes the same keyword arguments as sounddevice.play.

plot(NFFT=512, noverlap=384, **kwargs)

Plot the spectrogram of the audio sample.

It takes the same keyword arguments as matplotlib.pyplot.specgram.

class pyroomacoustics.datasets.base.Dataset

Bases: object

The base class for a data corpus. It has basically a list of samples and a filter function

samples

A list of all the Samples in the dataset

Type

list

info

This dictionary keeps track of all the fields in the metadata. The keys of the dictionary are the metadata field names. The values are again dictionaries, but with the keys being the possible values taken by the metadata and the associated value, the number of samples with this value in the corpus.

Type

dict

add_sample(sample)

Add a sample to the Dataset and keep track of the metadata.

add_sample_matching(sample, **kwargs)

The sample is added to the corpus only if all the keyword arguments match the metadata of the sample. The match is operated by pyroomacoustics.datasets.Meta.match.

filter(**kwargs)

Filter the corpus and selects samples that match the criterias provided The arguments to the keyword can be 1) a string, 2) a list of strings, 3) a function. There is a match if one of the following is True.

  1. value == attribute

  2. value is a list and attribute in value == True

  3. value is a callable (a function) and value(attribute) == True

head(n=5)

Print n samples from the dataset

class pyroomacoustics.datasets.base.Meta(**attr)

Bases: object

A simple class that will take a dictionary as input and put the values in attributes named after the keys. We use it to store metadata for the samples

The parameters can be any set of keyword arguments. They will all be transformed into attribute of the object.

as_dict()

Returns all the attribute/value pairs of the object as a dictionary

match(**kwargs)

The key/value pairs given by the keyword arguments are compared to the attribute/value pairs of the object. If the values all match, True is returned. Otherwise False is returned. If a keyword argument has no attribute counterpart, an error is raised. Attributes that do not have a keyword argument counterpart are ignored.

There are three ways to match an attribute with keyword=value: 1. value == attribute 2. value is a list and attribute in value == True 3. value is a callable (a function) and value(attribute) == True

class pyroomacoustics.datasets.base.Sample(data, **kwargs)

Bases: object

The base class for a dataset sample. The idea is that different corpus will have different attributes for the samples. They should at least have a data attribute.

data

The actual data

Type

array_like

meta

An object containing the sample metadata. They can be accessed using the dot operator

Type

pyroomacoustics.datasets.Meta

Dataset Utilities

pyroomacoustics.datasets.utils.download_uncompress(url, path='.', compression=None, context=None)

This functions download and uncompress on the fly a file of type tar, tar.gz, tar.bz2.

Parameters
  • url (str) – The URL of the file

  • path (str, optional) – The path where to uncompress the file

  • compression (str, optional) – The compression type (one of ‘bz2’, ‘gz’, ‘tar’), infered from url if not provided

  • context (SSL certification, optional) – Default is to use none.

Adaptive Filtering

Module contents

Adaptive Filter Algorithms

This sub-package provides implementations of popular adaptive filter algorithms.

RLS
Recursive Least Squares
LMS
Least Mean Squares and Normalized Least Mean Squares

All these classes derive from the base class pyroomacoustics.adaptive.adaptive_filter.AdaptiveFilter that offer a generic way of running an adaptive filter.

The above classes are applicable for time domain processing. For frequency domain adaptive filtering, there is the SubbandLMS class. After using a DFT or STFT block, the SubbandLMS class can be used to used to apply LMS or NLMS to each frequency band. A shorter adaptive filter can be used on each band as opposed to the filter required in the time domain version. Roughly, a filter of M taps applied to each band (total of B) corresponds to a time domain filter with N = M x B taps.

How to use the adaptive filter module

First, an adaptive filter object is created and all the relevant options can be set (step size, regularization, etc). Then, the update function is repeatedly called to provide new samples to the algorithm.

# initialize the filter
rls = pyroomacoustics.adaptive.RLS(30)

# run the filter on a stream of samples
for i in range(100):
    rls.update(x[i], d[i])

# the reconstructed filter is available
print('Reconstructed filter:', rls.w)

The SubbandLMS class has the same methods as the time domain approaches. However, the signal must be in the frequency domain. This can be done with the STFT block in the transform sub-package of pyroomacoustics.

# initialize STFT and SubbandLMS blocks
block_size = 128
stft_x = pra.transform.STFT(N=block_size,
    hop=block_size//2,
    analysis_window=pra.hann(block_size))
stft_d = pra.transform.STFT(N=block_size,
    hop=block_size//2,
    analysis_window=pra.hann(block_size))
nlms = pra.adaptive.SubbandLMS(num_taps=6,
    num_bands=block_size//2+1, mu=0.5, nlms=True)

# preparing input and reference signals
...

# apply block-by-block
for n in range(num_blocks):

    # obtain block
    ...

    # to frequency domain
    stft_x.analysis(x_block)
    stft_d.analysis(d_block)
    nlms.update(stft_x.X, stft_d.X)

    # estimating input convolved with unknown response
    y_hat = stft_d.synthesis(np.diag(np.dot(nlms.W.conj().T,stft_x.X)))

    # AEC output
    E = stft_d.X - np.diag(np.dot(nlms.W.conj().T,stft_x.X))
    out = stft_d.synthesis(E)
Other Available Subpackages
pyroomacoustics.adaptive.data_structures

this provides abstractions for computing functions on regular or irregular grids defined on circles and spheres with peak finding methods

pyroomacoustics.adaptive.util

a few methods mainly to efficiently manipulate Toeplitz and Hankel matrices

Utilities
pyroomacoustics.adaptive.algorithms

a dictionary containing all the adaptive filter object subclasses availables indexed by keys ['RLS', 'BlockRLS', 'BlockLMS', 'NLMS', 'SubbandLMS']

Algorithms

Adaptive Filter (Base)

class pyroomacoustics.adaptive.adaptive_filter.AdaptiveFilter(length)

Bases: object

The dummy base class of an adaptive filter. This class doesn’t compute anything. It merely stores values in a buffer. It is used as a template for all other algorithms.

name()
reset()

Reset the state of the adaptive filter

update(x_n, d_n)

Updates the adaptive filter with a new sample

Parameters
  • x_n (float) – the new input sample

  • d_n (float) – the new noisy reference signal

Least Mean Squares

Least Mean Squares Family

Implementations of adaptive filters from the LMS class. These algorithms have a low complexity and reliable behavior with a somewhat slower convergence.

class pyroomacoustics.adaptive.lms.BlockLMS(length, mu=0.01, L=1, nlms=False)

Bases: NLMS

Implementation of the least mean squares algorithm (NLMS) in its block form

Parameters
  • length (int) – the length of the filter

  • mu (float, optional) – the step size (default 0.01)

  • L (int, optional) – block size (default is 1)

  • nlms (bool, optional) – whether or not to normalize as in NLMS (default is False)

reset()

Reset the state of the adaptive filter

update(x_n, d_n)

Updates the adaptive filter with a new sample

Parameters
  • x_n (float) – the new input sample

  • d_n (float) – the new noisy reference signal

class pyroomacoustics.adaptive.lms.NLMS(length, mu=0.5)

Bases: AdaptiveFilter

Implementation of the normalized least mean squares algorithm (NLMS)

Parameters
  • length (int) – the length of the filter

  • mu (float, optional) – the step size (default 0.5)

update(x_n, d_n)

Updates the adaptive filter with a new sample

Parameters
  • x_n (float) – the new input sample

  • d_n (float) – the new noisy reference signal

Recursive Least Squares

Recursive Least Squares Family

Implementations of adaptive filters from the RLS class. These algorithms typically have a higher computational complexity, but a faster convergence.

class pyroomacoustics.adaptive.rls.BlockRLS(length, lmbd=0.999, delta=10, dtype=<MagicMock id='139898609708048'>, L=None)

Bases: RLS

Block implementation of the recursive least-squares (RLS) algorithm. The difference with the vanilla implementation is that chunks of the input signals are processed in batch and some savings can be made there.

Parameters
  • length (int) – the length of the filter

  • lmbd (float, optional) – the exponential forgetting factor (default 0.999)

  • delta (float, optional) – the regularization term (default 10)

  • dtype (numpy type) – the bit depth of the numpy arrays to use (default np.float32)

  • L (int, optional) – the block size (default to length)

reset()

Reset the state of the adaptive filter

update(x_n, d_n)

Updates the adaptive filter with a new sample

Parameters
  • x_n (float) – the new input sample

  • d_n (float) – the new noisy reference signal

class pyroomacoustics.adaptive.rls.RLS(length, lmbd=0.999, delta=10, dtype=<MagicMock id='139898609965136'>)

Bases: AdaptiveFilter

Implementation of the exponentially weighted Recursive Least Squares (RLS) adaptive filter algorithm.

Parameters
  • length (int) – the length of the filter

  • lmbd (float, optional) – the exponential forgetting factor (default 0.999)

  • delta (float, optional) – the regularization term (default 10)

  • dtype (numpy type) – the bit depth of the numpy arrays to use (default np.float32)

reset()

Reset the state of the adaptive filter

update(x_n, d_n)

Updates the adaptive filter with a new sample

Parameters
  • x_n (float) – the new input sample

  • d_n (float) – the new noisy reference signal

Subband LMS

class pyroomacoustics.adaptive.subband_lms.SubbandLMS(num_taps, num_bands, mu=0.5, nlms=True)

Bases: object

Frequency domain implementation of LMS. Adaptive filter for each subband.

Parameters
  • num_taps (int) – length of the filter

  • num_bands (int) – number of frequency bands, i.e. number of filters

  • mu (float, optional) – step size for each subband (default 0.5)

  • nlms (bool, optional) – whether or not to normalize as in NLMS (default is True)

reset()
update(X_n, D_n)

Updates the adaptive filters for each subband with the new block of input data.

Parameters
  • X_n (numpy array, float) – new input signal (to unknown system) in frequency domain

  • D_n (numpy array, float) – new noisy reference signal in frequency domain

pyroomacoustics.adaptive.subband_lms.hermitian(X)

Compute and return Hermitian transpose

Tools and Helpers

Data Structures

class pyroomacoustics.adaptive.data_structures.Buffer(length=20, dtype=<MagicMock id='139898610029008'>)

Bases: object

A simple buffer class with amortized cost

Parameters
  • length (int) – buffer length

  • dtype (numpy.type) – data type

flush(n)

Removes the n oldest elements in the buffer

push(val)

Add one element at the front of the buffer

size()

Returns the number of elements in the buffer

top(n)

Returns the n elements at the front of the buffer from newest to oldest

class pyroomacoustics.adaptive.data_structures.CoinFlipper(p, length=10000)

Bases: object

This class efficiently generates large number of coin flips. Because each call to numpy.random.rand is a little bit costly, it is more efficient to generate many values at once. This class does this and stores them in advance. It generates new fresh numbers when needed.

Parameters
  • p (float, 0 < p < 1) – probability to output a 1

  • length (int) – the number of flips to precompute

flip(n)

Get n random binary values from the buffer

flip_all()

Regenerates all the used up values

fresh_flips(n)

Generates n binary random values now

class pyroomacoustics.adaptive.data_structures.Powers(a, length=20, dtype=<MagicMock id='139898610003024'>)

Bases: object

This class allows to store all powers of a small number and get them ‘a la numpy’ with the bracket operator. There is automatic increase when new values are requested

Parameters
  • a (float) – the number

  • length (int) – the number of integer powers

  • dtype (numpy.type, optional) – the data type (typically np.float32 or np.float64)

Example

>>> an = Powers(0.5)
>>> print(an[4])
0.0625

Utilities

pyroomacoustics.adaptive.util.autocorr(x)

Fast autocorrelation computation using the FFT

pyroomacoustics.adaptive.util.hankel_multiplication(c, r, A, mkl=True, **kwargs)

Compute numpy.dot(scipy.linalg.hankel(c,r=r), A) using the FFT.

Parameters
  • c (ndarray) – the first column of the Hankel matrix

  • r (ndarray) – the last row of the Hankel matrix

  • A (ndarray) – the matrix to multiply on the right

  • mkl (bool, optional) – if True, use the mkl_fft package if available

pyroomacoustics.adaptive.util.hankel_stride_trick(x, shape)

Make a Hankel matrix from a vector using stride tricks

Parameters
  • x (ndarray) – a vector that contains the concatenation of the first column and first row of the Hankel matrix to build without repetition of the lower left corner value of the matrix

  • shape (tuple) – the shape of the Hankel matrix to build, it must satisfy x.shape[0] == shape[0] + shape[1] - 1

pyroomacoustics.adaptive.util.mkl_toeplitz_multiplication(c, r, A, A_padded=False, out=None, fft_len=None)

Compute numpy.dot(scipy.linalg.toeplitz(c,r), A) using the FFT from the mkl_fft package.

Parameters
  • c (ndarray) – the first column of the Toeplitz matrix

  • r (ndarray) – the first row of the Toeplitz matrix

  • A (ndarray) – the matrix to multiply on the right

  • A_padded (bool, optional) – the A matrix can be pre-padded with zeros by the user, if this is the case set to True

  • out (ndarray, optional) – an ndarray to store the output of the multiplication

  • fft_len (int, optional) – specify the length of the FFT to use

pyroomacoustics.adaptive.util.naive_toeplitz_multiplication(c, r, A)

Compute numpy.dot(scipy.linalg.toeplitz(c,r), A)

Parameters
  • c (ndarray) – the first column of the Toeplitz matrix

  • r (ndarray) – the first row of the Toeplitz matrix

  • A (ndarray) – the matrix to multiply on the right

pyroomacoustics.adaptive.util.toeplitz_multiplication(c, r, A, **kwargs)

Compute numpy.dot(scipy.linalg.toeplitz(c,r), A) using the FFT.

Parameters
  • c (ndarray) – the first column of the Toeplitz matrix

  • r (ndarray) – the first row of the Toeplitz matrix

  • A (ndarray) – the matrix to multiply on the right

pyroomacoustics.adaptive.util.toeplitz_opt_circ_approx(r, matrix=False)

Optimal circulant approximation of a symmetric Toeplitz matrix by Tony F. Chan

Parameters
  • r (ndarray) – the first row of the symmetric Toeplitz matrix

  • matrix (bool, optional) – if True, the full symetric Toeplitz matrix is returned, otherwise, only the first column

pyroomacoustics.adaptive.util.toeplitz_strang_circ_approx(r, matrix=False)

Circulant approximation to a symetric Toeplitz matrix by Gil Strang

Parameters
  • r (ndarray) – the first row of the symmetric Toeplitz matrix

  • matrix (bool, optional) – if True, the full symetric Toeplitz matrix is returned, otherwise, only the first column

Blind Source Separation

Module contents

Blind Source Separation

Implementations of a few blind source separation (BSS) algorithms.

AuxIVA
Independent Vector Analysis 1
Trinicon
ILRMA
Independent Low-Rank Matrix Analysis 3
SparseAuxIVA
Sparse Independent Vector Analysis 4
FastMNMF
Fast Multichannel Nonnegative Matrix Factorization 5
FastMNMF2
Fast Multichannel Nonnegative Matrix Factorization 2 6

A few commonly used functions, such as projection back, can be found in pyroomacoustics.bss.common.

References

1

N. Ono, Stable and fast update rules for independent vector analysis based on auxiliary function technique, Proc. IEEE, WASPAA, pp. 189-192, Oct. 2011.

2

R. Aichner, H. Buchner, F. Yan, and W. Kellermann A real-time blind source separation scheme and its application to reverberant and noisy acoustic environments, Signal Processing, 86(6), 1260-1277. doi:10.1016/j.sigpro.2005.06.022, 2006.

3

D. Kitamura, N. Ono, H. Sawada, H. Kameoka, H. Saruwatari, Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization, IEEE/ACM Trans. ASLP, vol. 24, no. 9, pp. 1626-1641, Sept. 2016

4

J. Janský, Z. Koldovský, and N. Ono, A computationally cheaper method for blind speech separation based on AuxIVA and incomplete demixing transform, Proc. IEEE, IWAENC, pp. 1-5, Sept. 2016.

5

K. Sekiguchi, A. A. Nugraha, Y. Bando, K. Yoshii, Fast Multichannel Source Separation Based on Jointly Diagonalizable Spatial Covariance Matrices, EUSIPCO, 2019.

6

K. Sekiguchi, Y. Bando, A. A. Nugraha, K. Yoshii, T. Kawahara, Fast Multichannel Nonnegative Matrix Factorization with Directivity-Aware Jointly-Diagonalizable Spatial Covariance Matrices for Blind Source Separation, IEEE/ACM Trans. ASLP, vol. 28, pp. 2610-2625, 2020.

Algorithms

Independent Vector Analysis (AuxIVA)

AuxIVA

Blind Source Separation using independent vector analysis based on auxiliary function. This function will separate the input signal into statistically independent sources without using any prior information.

The algorithm in the determined case, i.e., when the number of sources is equal to the number of microphones, is AuxIVA 1. When there are more microphones (the overdetermined case), a computationaly cheaper variant (OverIVA) is used 2.

Example

from scipy.io import wavfile
import pyroomacoustics as pra

# read multichannel wav file
# audio.shape == (nsamples, nchannels)
fs, audio = wavfile.read("my_multichannel_audio.wav")

# STFT analysis parameters
fft_size = 4096  # `fft_size / fs` should be ~RT60
hop == fft_size // 2  # half-overlap
win_a = pra.hann(fft_size)  # analysis window
# optimal synthesis window
win_s = pra.transform.compute_synthesis_window(win_a, hop)

# STFT
# X.shape == (nframes, nfrequencies, nchannels)
X = pra.transform.analysis(audio, fft_size, hop, win=win_a)

# Separation
Y = pra.bss.auxiva(X, n_iter=20)

# iSTFT (introduces an offset of `hop` samples)
# y contains the time domain separated signals
# y.shape == (new_nsamples, nchannels)
y = pra.transform.synthesis(Y, fft_size, hop, win=win_s)

References

1

N. Ono, Stable and fast update rules for independent vector analysis based on auxiliary function technique, Proc. IEEE, WASPAA, pp. 189-192, Oct. 2011.

2

R. Scheibler and N. Ono, Independent Vector Analysis with more Microphones than Sources, arXiv, 2019. https://arxiv.org/abs/1905.07880

pyroomacoustics.bss.auxiva.auxiva(X, n_src=None, n_iter=20, proj_back=True, W0=None, model='laplace', init_eig=False, return_filters=False, callback=None)

This is an implementation of AuxIVA/OverIVA that separates the input signal into statistically independent sources. The separation is done in the time-frequency domain and the FFT length should be approximately equal to the reverberation time.

Two different statistical models (Laplace or time-varying Gauss) can be used by using the keyword argument model. The performance of Gauss model is higher in good conditions (few sources, low noise), but Laplace (the default) is more robust in general.

Parameters
  • X (ndarray (nframes, nfrequencies, nchannels)) – STFT representation of the signal

  • n_src (int, optional) – The number of sources or independent components. When n_src==nchannels, the algorithms is identical to AuxIVA. When n_src==1, then it is doing independent vector extraction.

  • n_iter (int, optional) – The number of iterations (default 20)

  • proj_back (bool, optional) – Scaling on first mic by back projection (default True)

  • W0 (ndarray (nfrequencies, nsrc, nchannels), optional) – Initial value for demixing matrix

  • model (str) – The model of source distribution ‘gauss’ or ‘laplace’ (default)

  • init_eig (bool, optional (default False)) – If True, and if W0 is None, then the weights are initialized using the principal eigenvectors of the covariance matrix of the input data. When False, the demixing matrices are initialized with identity matrix.

  • return_filters (bool) – If true, the function will return the demixing matrix too

  • callback (func) – A callback function called every 10 iterations, allows to monitor convergence

Returns

  • Returns an (nframes, nfrequencies, nsources) array. Also returns

  • the demixing matrix (nfrequencies, nchannels, nsources)

  • if return_values keyword is True.

Trinicon

pyroomacoustics.bss.trinicon.trinicon(signals, w0=None, filter_length=2048, block_length=None, n_blocks=8, alpha_on=None, j_max=10, delta_max=0.0001, sigma2_0=1e-07, mu=0.001, lambd_a=0.2, return_filters=False)

Implementation of the TRINICON Blind Source Separation algorithm as described in

R. Aichner, H. Buchner, F. Yan, and W. Kellermann A real-time blind source separation scheme and its application to reverberant and noisy acoustic environments, Signal Processing, 86(6), 1260-1277. doi:10.1016/j.sigpro.2005.06.022, 2006. [pdf]

Specifically, adaptation of the pseudo-code from Table 1.

The implementation is hard-coded for 2 output channels.

Parameters
  • signals (ndarray (nchannels, nsamples)) – The microphone input signals (time domain)

  • w0 (ndarray (nchannels, nsources, nsamples), optional) – Optional initial value for the demixing filters

  • filter_length (int, optional) – The length of the demixing filters, if w0 is provided, this option is ignored

  • block_length (int, optional) – Block length (default 2x filter_length)

  • n_blocks (int, optional) – Number of blocks processed at once (default 8)

  • alpha_on (int, optional) – Online overlap factor (default n_blocks)

  • j_max (int, optional) – Number of offline iterations (default 10)

  • delta_max (float, optional) – Regularization parameter, this sets the maximum value of the regularization term (default 1e-4)

  • sigma2_0 (float, optional) – Regularization parameter, this sets the reference (machine?) noise level in the regularization (default 1e-7)

  • mu (float, optional) – Offline update step size (default 0.001)

  • lambd_a (float, optional) – Online forgetting factor (default 0.2)

  • return_filters (bool) – If true, the function will return the demixing matrix too (default False)

Returns

Returns an (nsources, nsamples) array. Also returns the demixing matrix (nchannels, nsources, nsamples) if return_filters keyword is True.

Return type

ndarray

Independent Low-Rank Matrix Analysis (ILRMA)

ILRMA

Blind Source Separation using Independent Low-Rank Matrix Analysis (ILRMA).

pyroomacoustics.bss.ilrma.ilrma(X, n_src=None, n_iter=20, proj_back=True, W0=None, n_components=2, return_filters=False, callback=None)

Implementation of ILRMA algorithm without partitioning function for BSS presented in

D. Kitamura, N. Ono, H. Sawada, H. Kameoka, H. Saruwatari, Determined blind source separation unifying independent vector analysis and nonnegative matrix factorization, IEEE/ACM Trans. ASLP, vol. 24, no. 9, pp. 1626-1641, Sept. 2016

D. Kitamura, N. Ono, H. Sawada, H. Kameoka, and H. Saruwatari Determined Blind Source Separation with Independent Low-Rank Matrix Analysis, in Audio Source Separation, S. Makino, Ed. Springer, 2018, pp. 125-156.

Parameters
  • X (ndarray (nframes, nfrequencies, nchannels)) – STFT representation of the observed signal n_src: int, optional The number of sources or independent components

  • n_iter (int, optional) – The number of iterations (default 20)

  • proj_back (bool, optional) – Scaling on first mic by back projection (default True)

  • W0 (ndarray (nfrequencies, nchannels, nchannels), optional) – Initial value for demixing matrix

  • n_components (int) – Number of components in the non-negative spectrum

  • return_filters (bool) – If true, the function will return the demixing matrix too

  • callback (func) – A callback function called every 10 iterations, allows to monitor convergence

Returns

  • Returns an (nframes, nfrequencies, nsources) array. Also returns

  • the demixing matrix W (nfrequencies, nchannels, nsources)

  • if return_filters keyword is True.

Sparse Independent Vector Analysis (SparseAuxIVA)

pyroomacoustics.bss.sparseauxiva.sparseauxiva(X, S=None, n_src=None, n_iter=20, proj_back=True, W0=None, model='laplace', return_filters=False, callback=None)

Implementation of sparse AuxIVA algorithm for BSS presented in

J. Janský, Z. Koldovský, and N. Ono, A computationally cheaper method for blind speech separation based on AuxIVA and incomplete demixing transform, Proc. IEEE, IWAENC, pp. 1-5, Sept. 2016.

Parameters
  • X (ndarray (nframes, nfrequencies, nchannels)) – STFT representation of the signal

  • n_src (int, optional) – The number of sources or independent components

  • S (ndarray (k_freq)) – Indexes of active frequency bins for sparse AuxIVA

  • n_iter (int, optional) – The number of iterations (default 20)

  • proj_back (bool, optional) – Scaling on first mic by back projection (default True)

  • W0 (ndarray (nfrequencies, nchannels, nchannels), optional) – Initial value for demixing matrix

  • model (str) – The model of source distribution ‘gauss’ or ‘laplace’ (default)

  • return_filters (bool) – If true, the function will return the demixing matrix too

  • callback (func) – A callback function called every 10 iterations, allows to monitor convergence

Returns

  • Returns an (nframes, nfrequencies, nsources) array. Also returns

  • the demixing matrix (nfrequencies, nchannels, nsources)

  • if return_values keyword is True.

Common Tools

pyroomacoustics.bss.common.projection_back(Y, ref, clip_up=None, clip_down=None)

This function computes the frequency-domain filter that minimizes the squared error to a reference signal. This is commonly used to solve the scale ambiguity in BSS.

Here is the derivation of the projection. The optimal filter z minimizes the squared error.

\[\min E[|z^* y - x|^2]\]

It should thus satsify the orthogonality condition and can be derived as follows

\[ \begin{align}\begin{aligned}\begin{split}0 & = E[y^*\\, (z^* y - x)]\end{split}\\\begin{split}0 & = z^*\\, E[|y|^2] - E[y^* x]\end{split}\\\begin{split}z^* & = \\frac{E[y^* x]}{E[|y|^2]}\end{split}\\\begin{split}z & = \\frac{E[y x^*]}{E[|y|^2]}\end{split}\end{aligned}\end{align} \]

In practice, the expectations are replaced by the sample mean.

Parameters
  • Y (array_like (n_frames, n_bins, n_channels)) – The STFT data to project back on the reference signal

  • ref (array_like (n_frames, n_bins)) – The reference signal

  • clip_up (float, optional) – Limits the maximum value of the gain (default no limit)

  • clip_down (float, optional) – Limits the minimum value of the gain (default no limit)

pyroomacoustics.bss.common.sparir(G, S, weights=<MagicMock name='mock()' id='139898609346640'>, gini=0, maxiter=50, tol=10, alpha=10, alphamax=100000.0, alphamin=1e-07)

Fast proximal algorithm implementation for sparse approximation of relative impulse responses from incomplete measurements of the corresponding relative transfer function based on

Z. Koldovsky, J. Malek, and S. Gannot, “Spatial Source Subtraction based on Incomplete Measurements of Relative Transfer Function”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, TASLP 2015.

The original Matlab implementation can be found at http://itakura.ite.tul.cz/zbynek/dwnld/SpaRIR.m

and it is referred in

Z. Koldovsky, F. Nesta, P. Tichavsky, and N. Ono, Frequency-domain blind speech separation using incomplete de-mixing transform, EUSIPCO 2016.

Parameters
  • G (ndarray (nfrequencies, 1)) – Frequency representation of the (incomplete) transfer function

  • S (ndarray (kfrequencies)) – Indexes of active frequency bins for sparse AuxIVA

  • weights (ndarray (kfrequencies) or int, optional) – The higher the value of weights(i), the higher the probability that g(i) is zero; if scalar, all weights are the same; if empty, default value is used

  • gini (ndarray (nfrequencies)) – Initialization for the computation of g

  • maxiter (int) – Maximum number of iterations before achieving convergence (default 50)

  • tol (float) – Minimum convergence criteria based on the gradient difference between adjacent updates (default 10)

  • alpha (float) – Inverse of the decreasing speed of the gradient at each iteration. This parameter is updated at every iteration (default 10)

  • alphamax (float) – Upper bound for alpha (default 1e5)

  • alphamin (float) – Lower bound for alpha (default 1e-7)

Returns

  • Returns the sparse approximation of the impulse response in the

  • time-domain (real-valued) as an (nfrequencies) array.

Fast Multichannel Nonnegative Matrix Factorization (FastMNMF)

FastMNMF

Blind Source Separation using Fast Multichannel Nonnegative Matrix Factorization (FastMNMF)

pyroomacoustics.bss.fastmnmf.fastmnmf(X, n_src=None, n_iter=30, n_components=8, mic_index=0, W0=None, accelerate=True, callback=None)

Implementation of FastMNMF algorithm presented in

K. Sekiguchi, A. A. Nugraha, Y. Bando, K. Yoshii, Fast Multichannel Source Separation Based on Jointly Diagonalizable Spatial Covariance Matrices, EUSIPCO, 2019. [arXiv] [IEEE]

The code of FastMNMF with GPU support and FastMNMF-DP which integrates DNN-based source model into FastMNMF is available on https://github.com/sekiguchi92/SoundSourceSeparation

Parameters
  • X (ndarray (nframes, nfrequencies, nchannels)) – STFT representation of the observed signal

  • n_src (int, optional) – The number of sound sources (default None). If None, n_src is set to the number of microphones

  • n_iter (int, optional) – The number of iterations (default 30)

  • n_components (int, optional) – Number of components in the non-negative spectrum (default 8)

  • mic_index (int or 'all', optional) – The index of microphone of which you want to get the source image (default 0). If ‘all’, return the source images of all microphones

  • W0 (ndarray (nfrequencies, nchannels, nchannels), optional) – Initial value for diagonalizer Q (default None). If None, identity matrices are used for all frequency bins.

  • accelerate (bool, optional) – If true, the basis and activation of NMF are updated simultaneously (default True)

  • callback (func, optional) – A callback function called every 10 iterations, allows to monitor convergence

Returns

  • If mic_index is int, returns an (nframes, nfrequencies, nsources) array.

  • If mic_index is ‘all’, returns an (nchannels, nframes, nfrequencies, nsources) array.

Direction of Arrival

Module contents

Direction of Arrival Finding

This sub-package provides implementations of popular direction of arrival findings algorithms.

MUSIC
Multiple Signal Classification 1
NormMUSIC
MUSIC with frequency normalization 2
SRP-PHAT
Steered Response Power – Phase Transform 3
CSSM
Coherent Signal Subspace Method 4
WAVES
Weighted Average of Signal Subspaces 5
TOPS
Test of Orthogonality of Projected Subspaces 6
FRIDA
Finite Rate of Innovation Direction of Arrival 7

All these classes derive from the abstract base class pyroomacoustics.doa.doa.DOA that offers generic methods for finding and visualizing the locations of acoustic sources.

The constructor can be called once to build the DOA finding object. Then, the method pyroomacoustics.doa.doa.DOA.locate_sources performs DOA finding based on time-frequency passed to it as an argument. Extra arguments can be supplied to indicate which frequency bands should be used for localization.

How to use the DOA module

Here R is a 2xQ ndarray that contains the locations of the Q microphones in the columns, fs is the sampling frequency of the input signal, and nfft the length of the FFT used.

The STFT snapshots are passed to the localization methods in the X ndarray of shape Q x (nfft // 2 + 1) x n_snapshots, where n_snapshots is the number of STFT frames to use for the localization. The option freq_bins can be provided to specify which frequency bins to use for the localization.

>>> doa = pyroomacoustics.doa.MUSIC(R, fs, nfft)
>>> doa.locate_sources(X, freq_bins=np.arange(20, 40))
Other Available Subpackages
pyroomacoustics.doa.grid

this provides abstractions for computing functions on regular or irregular grids defined on circles and spheres with peak finding methods

pyroomacoustics.doa.plotters

a few methods to plot functions and points on circles or spheres

pyroomacoustics.doa.detect_peaks

1D peak detection routine from Marcos Duarte

pyroomacoustics.doa.tools_frid_doa_plane

routines implementing FRIDA algorithm

Utilities
pyroomacoustics.doa.algorithms

a dictionary containing all the DOA object subclasses availables indexed by keys ['MUSIC', 'NormMUSIC', 'SRP', 'CSSM', 'WAVES', 'TOPS', 'FRIDA']

Note on MUSIC

Since NormMUSIC has a more robust performance, we recommend to use NormMUSIC over MUSIC. When MUSIC is used as a baseline for publications, we recommend to use both NormMUSIC and MUSIC. For more information, you may have a look at our jupyter notebook at https://github.com/LCAV/pyroomacoustics/tree/master/notebooks/norm_music_demo.ipynb

References

1

R. Schmidt, Multiple emitter location and signal parameter estimation, IEEE Trans. Antennas Propag., Vol. 34, Num. 3, pp 276–280, 1986

2

D. Salvati, C. Drioli, G. L. Foresti, Incoherent Frequency Fusion for Broadband Steered Response Power Algorithms in Noisy Environments, IEEE Signal Process. Lett., Vol. 21, Num. 5, pp 581-585, 2014

3

J. H. DiBiase, A high-accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays, PHD Thesis, Brown University, 2000

4

H. Wang, M. Kaveh, Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources, IEEE Trans. Acoust., Speech, Signal Process., Vol. 33, Num. 4, pp 823–831, 1985

5

E. D. di Claudio, R. Parisi, WAVES: Weighted average of signal subspaces for robust wideband direction finding, IEEE Trans. Signal Process., Vol. 49, Num. 10, 2179–2191, 2001

6

Y. Yeo-Sun, L. M. Kaplan, J. H. McClellan, TOPS: New DOA estimator for wideband signals, IEEE Trans. Signal Process., Vol. 54, Num 6., pp 1977–1989, 2006

7

H. Pan, R. Scheibler, E. Bezzam, I. Dokmanic, and M. Vetterli, FRIDA: FRI-based DOA estimation for arbitrary array layouts, Proc. ICASSP, pp 3186-3190, 2017

Algorithms

CSSM

class pyroomacoustics.doa.cssm.CSSM(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, num_iter=5, **kwargs)

Bases: MUSIC

Class to apply the Coherent Signal-Subspace method [CSSM] for Direction of Arrival (DoA) estimation.

Note

Run locate_sources() to apply the CSSM algorithm.

Parameters
  • L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.

  • fs (float) – Sampling frequency.

  • nfft (int) – FFT length.

  • c (float) – Speed of sound. Default: 343 m/s

  • num_src (int) – Number of sources to detect. Default: 1

  • mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’

  • r (numpy array) – Candidate distances from the origin. Default: np.ones(1)

  • azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180

  • colatitude (numpy array) – Candidate colatitude angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)

  • num_iter (int) – Number of iterations for CSSM. Default: 5

References

CSSM

H. Wang, M. Kaveh, Coherent signal-subspace processing for the detection and estimation of angles of arrival of multiple wide-band sources, IEEE Trans. Acoust., Speech, Signal Process., Vol. 33, Num. 4, pp 823–831, 1985

FRIDA

class pyroomacoustics.doa.frida.FRIDA(L, fs, nfft, max_four=None, c=343.0, num_src=1, G_iter=None, max_ini=5, n_rot=1, max_iter=50, noise_level=1e-10, low_rank_cleaning=False, stopping='max_iter', stft_noise_floor=0.0, stft_noise_margin=1.5, signal_type='visibility', use_lu=True, verbose=False, symb=True, use_cache=False, **kwargs)

Bases: DOA

Implements the FRI-based direction of arrival finding algorithm [FRIDA].

Note

Run locate_sources() to apply the CSSM algorithm.

Parameters
  • L (ndarray) – Contains the locations of the microphones in the columns

  • fs (int or float) – Sampling frequency

  • nfft (int) – FFT size

  • max_four (int) – Maximum order of the Fourier or spherical harmonics expansion

  • c (float, optional) – Speed of sound

  • num_src (int, optional) – The number of sources to recover (default 1)

  • G_iter (int) – Number of mapping matrix refinement iterations in recovery algorithm (default 1)

  • max_ini (int, optional) – Number of random initializations to use in recovery algorithm (default 5)

  • n_rot (int, optional) – Number of random rotations to apply before recovery algorithm (default 10)

  • noise_level (float, optional) – Noise level in the visibility measurements, if available (default 1e-10)

  • stopping (str, optional) – Stopping criteria for the recovery algorithm. Can be max iterations or noise level (default max_iter)

  • stft_noise_floor (float) – The noise floor in the STFT measurements, if available (default 0)

  • stft_noise_margin (float) – When this, along with stft_noise_floor is set, we only pick frames with at least stft_noise_floor * stft_noise_margin power

  • signal_type (str) –

    Which type of measurements to use:

    • ’visibility’: Cross correlation measurements

    • ’raw’: Microphone signals

  • use_lu (bool, optional) – Whether to use LU decomposition for efficiency

  • verbose (bool, optional) – Whether to output intermediate result for debugging purposes

  • symb (bool, optional) – Whether enforce the symmetry on the reconstructed uniform samples of sinusoids b

References

FRIDA

H. Pan, R. Scheibler, E. Bezzam, I. Dokmanic, and M. Vetterli, FRIDA: FRI-based DOA estimation for arbitrary array layouts, Proc. ICASSP, pp 3186-3190, 2017

MUSIC

class pyroomacoustics.doa.music.MUSIC(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, frequency_normalization=False, **kwargs)

Bases: DOA

Class to apply MUltiple SIgnal Classication (MUSIC) direction-of-arrival (DoA) for a particular microphone array.

Note

Run locate_source() to apply the MUSIC algorithm.

Parameters
  • L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.

  • fs (float) – Sampling frequency.

  • nfft (int) – FFT length.

  • c (float) – Speed of sound. Default: 343 m/s

  • num_src (int) – Number of sources to detect. Default: 1

  • mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’

  • r (numpy array) – Candidate distances from the origin. Default: np.ones(1)

  • azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180

  • colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)

  • frequency_normalization (bool) – If True, the MUSIC pseudo-spectra are normalized before averaging across the frequency axis, default:False

plot_individual_spectrum()

Plot the steered response for each frequency.

SRP-PHAT

class pyroomacoustics.doa.srp.SRP(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, **kwargs)

Bases: DOA

Class to apply Steered Response Power (SRP) direction-of-arrival (DoA) for a particular microphone array.

Note

Run locate_source() to apply the SRP-PHAT algorithm.

Parameters
  • L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.

  • fs (float) – Sampling frequency.

  • nfft (int) – FFT length.

  • c (float) – Speed of sound. Default: 343 m/s

  • num_src (int) – Number of sources to detect. Default: 1

  • mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’

  • r (numpy array) – Candidate distances from the origin. Default: np.ones(1)

  • azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180

  • colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)

TOPS

class pyroomacoustics.doa.tops.TOPS(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, **kwargs)

Bases: MUSIC

Class to apply Test of Orthogonality of Projected Subspaces [TOPS] for Direction of Arrival (DoA) estimation.

Note

Run locate_source() to apply the TOPS algorithm.

Parameters
  • L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.

  • fs (float) – Sampling frequency.

  • nfft (int) – FFT length.

  • c (float) – Speed of sound. Default: 343 m/s

  • num_src (int) – Number of sources to detect. Default: 1

  • mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’

  • r (numpy array) – Candidate distances from the origin. Default: np.ones(1)

  • azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180

  • colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)

References

TOPS

Y. Yeo-Sun, L. M. Kaplan, J. H. McClellan, TOPS: New DOA estimator for wideband signals, IEEE Trans. Signal Process., Vol. 54, Num 6., pp 1977–1989, 2006

WAVES

class pyroomacoustics.doa.waves.WAVES(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, num_iter=5, **kwargs)

Bases: MUSIC

Class to apply Weighted Average of Signal Subspaces [WAVES] for Direction of Arrival (DoA) estimation.

Note

Run locate_sources() to apply the WAVES algorithm.

Parameters
  • L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.

  • fs (float) – Sampling frequency.

  • nfft (int) – FFT length.

  • c (float) – Speed of sound. Default: 343 m/s

  • num_src (int) – Number of sources to detect. Default: 1

  • mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’

  • r (numpy array) – Candidate distances from the origin. Default: np.ones(1)

  • azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180

  • colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)

  • num_iter (int) – Number of iterations for CSSM. Default: 5

References

WAVES

E. D. di Claudio, R. Parisi, WAVES: Weighted average of signal subspaces for robust wideband direction finding, IEEE Trans. Signal Process., Vol. 49, Num. 10, 2179–2191, 2001

Tools and Helpers

DOA (Base)

class pyroomacoustics.doa.doa.DOA(L, fs, nfft, c=343.0, num_src=1, mode='far', r=None, azimuth=None, colatitude=None, n_grid=None, dim=2, *args, **kwargs)

Bases: object

Abstract parent class for Direction of Arrival (DoA) algorithms. After creating an object (SRP, MUSIC, CSSM, WAVES, or TOPS), run locate_source to apply the corresponding algorithm.

Parameters
  • L (numpy array) – Microphone array positions. Each column should correspond to the cartesian coordinates of a single microphone.

  • fs (float) – Sampling frequency.

  • nfft (int) – FFT length.

  • c (float) – Speed of sound. Default: 343 m/s

  • num_src (int) – Number of sources to detect. Default: 1

  • mode (str) – ‘far’ or ‘near’ for far-field or near-field detection respectively. Default: ‘far’

  • r (numpy array) – Candidate distances from the origin. Default: np.ones(1)

  • azimuth (numpy array) – Candidate azimuth angles (in radians) with respect to x-axis. Default: np.linspace(-180.,180.,30)*np.pi/180

  • colatitude (numpy array) – Candidate elevation angles (in radians) with respect to z-axis. Default is x-y plane search: np.pi/2*np.ones(1)

  • n_grid (int) – If azimuth and colatitude are not specified, we will create a grid with so many points. Default is 360.

  • dim (int) – The dimension of the problem. Set dim=2 to find sources on the circle (x-y plane). Set dim=3 to search on the whole sphere.

locate_sources(X, num_src=None, freq_range=[500.0, 4000.0], freq_bins=None, freq_hz=None)

Locate source(s) using corresponding algorithm.

Parameters
  • X (numpy array) – Set of signals in the frequency (RFFT) domain for current frame. Size should be M x F x S, where M should correspond to the number of microphones, F to nfft/2+1, and S to the number of snapshots (user-defined). It is recommended to have S >> M.

  • num_src (int) – Number of sources to detect. Default is value given to object constructor.

  • freq_range (list of floats, length 2) – Frequency range on which to run DoA: [fmin, fmax].

  • freq_bins (list of int) – freq_bins: List of individual frequency bins on which to run DoA. If defined by user, it will not take into consideration freq_range or freq_hz.

  • freq_hz (list of floats) – List of individual frequencies on which to run DoA. If defined by user, it will not take into consideration freq_range.

polar_plt_dirac(azimuth_ref=None, alpha_ref=None, save_fig=False, file_name=None, plt_dirty_img=True)

Generate polar plot of DoA results.

Parameters
  • azimuth_ref (numpy array) – True direction of sources (in radians).

  • alpha_ref (numpy array) – Estimated amplitude of sources.

  • save_fig (bool) – Whether or not to save figure as pdf.

  • file_name (str) – Name of file (if saved). Default is ‘polar_recon_dirac.pdf’

  • plt_dirty_img (bool) – Whether or not to plot spatial spectrum or ‘dirty image’ in the case of FRI.

class pyroomacoustics.doa.doa.ModeVector(L, fs, nfft, c, grid, mode='far', precompute=False)

Bases: object

This is a class for look-up tables of mode vectors. This look-up table is an outer product of three vectors running along candidate locations, time, and frequency. When the grid becomes large, the look-up table might be too large to store in memory. In that case, this class allows to only compute the outer product elements when needed, only keeping the three vectors in memory. When the table is small, a precompute option can be set to True to compute the whole table in advance.

Tools for FRIDA (azimuth only)

pyroomacoustics.doa.tools_fri_doa_plane.Rmtx_ri(coef_ri, K, D, L)

Split T matrix in rea/imaginary representation

pyroomacoustics.doa.tools_fri_doa_plane.Rmtx_ri_half(coef_half, K, D, L, D_coef)

Split T matrix in rea/imaginary conjugate symmetric representation

pyroomacoustics.doa.tools_fri_doa_plane.Rmtx_ri_half_out_half(coef_half, K, D, L, D_coef, mtx_shrink)

if both b and annihilation filter coefficients are Hermitian symmetric, then the output will also be Hermitian symmetric => the effectively output is half the size

pyroomacoustics.doa.tools_fri_doa_plane.Tmtx_ri(b_ri, K, D, L)

build convolution matrix associated with b_ri

Parameters
  • b_ri – a real-valued vector

  • K – number of Diracs

  • D1 – expansion matrix for the real-part

  • D2 – expansion matrix for the imaginary-part

pyroomacoustics.doa.tools_fri_doa_plane.Tmtx_ri_half(b_ri, K, D, L, D_coef)

Split T matrix in conjugate symmetric representation

pyroomacoustics.doa.tools_fri_doa_plane.Tmtx_ri_half_out_half(b_ri, K, D, L, D_coef, mtx_shrink)

if both b and annihilation filter coefficients are Hermitian symmetric, then the output will also be Hermitian symmetric => the effectively output is half the size

pyroomacoustics.doa.tools_fri_doa_plane.build_mtx_amp(phi_k, p_mic_x, p_mic_y)

the matrix that maps Diracs’ amplitudes to the visibility

Parameters
  • phi_k – Diracs’ location (azimuth)

  • p_mic_x – a vector that contains microphones’ x-coordinates

  • p_mic_y – a vector that contains microphones’ y-coordinates

pyroomacoustics.doa.tools_fri_doa_plane.build_mtx_amp_ri(p_mic_x, p_mic_y, phi_k)

builds real/imaginary amplitude matrix

pyroomacoustics.doa.tools_fri_doa_plane.build_mtx_raw_amp(p_mic_x, p_mic_y, phi_k)

the matrix that maps Diracs’ amplitudes to the visibility

Parameters
  • phi_k – Diracs’ location (azimuth)

  • p_mic_x – a vector that contains microphones’ x-coordinates

  • p_mic_y – a vector that contains microphones’ y-coordinates

pyroomacoustics.doa.tools_fri_doa_plane.coef_expan_mtx(K)

expansion matrix for an annihilating filter of size K + 1

Parameters

K – number of Dirac. The filter size is K + 1

pyroomacoustics.doa.tools_fri_doa_plane.compute_b(G_lst, GtG_lst, beta_lst, Rc0, num_bands, a_ri, use_lu=False, GtG_inv_lst=None)

compute the uniform sinusoidal samples b from the updated annihilating filter coeffiients.

Parameters
  • GtG_lst – list of G^H G for different subbands

  • beta_lst – list of beta-s for different subbands

  • Rc0 – right-dual matrix, here it is the convolution matrix associated with c

  • num_bands – number of bands

  • a_ri – a 2D numpy array. each column corresponds to the measurements within a subband

pyroomacoustics.doa.tools_fri_doa_plane.compute_mtx_obj(GtG_lst, Tbeta_lst, Rc0, num_bands, K)
compute the matrix (M) in the objective function:

min c^H M c s.t. c0^H c = 1

Parameters
  • GtG_lst – list of G^H * G

  • Tbeta_lst – list of Teoplitz matrices for beta-s

  • Rc0 – right dual matrix for the annihilating filter (same of each block -> not a list)

pyroomacoustics.doa.tools_fri_doa_plane.compute_obj_val(GtG_inv_lst, Tbeta_lst, Rc0, c_ri_half, num_bands, K)

compute the fitting error. CAUTION: Here we assume use_lu = True

pyroomacoustics.doa.tools_fri_doa_plane.cov_mtx_est(y_mic)

estimate covariance matrix

Parameters

y_mic – received signal (complex based band representation) at microphones

pyroomacoustics.doa.tools_fri_doa_plane.cpx_mtx2real(mtx)

extend complex valued matrix to an extended matrix of real values only

Parameters

mtx – input complex valued matrix

pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri(G, a_ri, K, M, noise_level, max_ini=100, stop_cri='mse')

Reconstruct point sources’ locations (azimuth) from the visibility measurements

Parameters
  • G – the linear transformation matrix that links the visibilities to uniformly sampled sinusoids

  • a_ri – the visibility measurements

  • K – number of Diracs

  • M – the Fourier series expansion is between -M and M

  • noise_level – level of noise (ell_2 norm) in the measurements

  • max_ini – maximum number of initialisations

  • stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’

pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri_half(G, a_ri, K, M, noise_level, max_ini=100, stop_cri='mse')

Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged.

Parameters
  • G (param) – the linear transformation matrix that links the visibilities to uniformly sampled sinusoids

  • a_ri – the visibility measurements

  • K – number of Diracs

  • M – the Fourier series expansion is between -M and M

  • noise_level – level of noise (ell_2 norm) in the measurements

  • max_ini – maximum number of initialisations

  • stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’

pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri_half_multiband(G_lst, a_ri, K, M, max_ini=100)

Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged.

Parameters
  • G_lst – a list of the linear transformation matrices that links the visibilities to uniformly sampled sinusoids

  • a_ri – the visibility measurements

  • K – number of Diracs

  • M – the Fourier series expansion is between -M and M

  • noise_level – level of noise (ell_2 norm) in the measurements

  • max_ini – maximum number of initialisations

  • stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’

pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri_half_multiband_lu(G_lst, GtG_lst, GtG_inv_lst, a_ri, K, M, max_ini=100, max_iter=50)

Here we use LU decomposition to precompute a few entries. Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged.

Parameters
  • G_lst – a list of the linear transformation matrices that links the visibilities to uniformly sampled sinusoids

  • a_ri – the visibility measurements

  • K – number of Diracs

  • M – the Fourier series expansion is between -M and M

  • noise_level – level of noise (ell_2 norm) in the measurements

  • max_ini – maximum number of initialisations

  • stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’

pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri_half_multiband_parallel(G, a_ri, K, M, max_ini=100)

Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged. We use parallel implementation when stop_cri == ‘max_iter’

Parameters
  • G – the linear transformation matrix that links the visibilities to uniformly sampled sinusoids

  • a_ri – the visibility measurements

  • K – number of Diracs

  • M – the Fourier series expansion is between -M and M

  • noise_level – level of noise (ell_2 norm) in the measurements

  • max_ini – maximum number of initialisations

  • stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’

pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri_half_parallel(G, a_ri, K, M, max_ini=100)

Reconstruct point sources’ locations (azimuth) from the visibility measurements. Here we enforce hermitian symmetry in the annihilating filter coefficients so that roots on the unit circle are encouraged. We use parallel implementation when stop_cri == ‘max_iter’

Parameters
  • G – the linear transformation matrix that links the visibilities to uniformly sampled sinusoids

  • a_ri – the visibility measurements

  • K – number of Diracs

  • M – the Fourier series expansion is between -M and M

  • noise_level – level of noise (ell_2 norm) in the measurements

  • max_ini – maximum number of initialisations

  • stop_cri – stopping criterion, either ‘mse’ or ‘max_iter’

pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri_inner(c_ri_half, a_ri, rhs, rhs_bl, K, M, D1, D2, D_coef, mtx_shrink, Tbeta_ri, G, max_iter)

inner loop of the dirac_recon_ri_half_parallel function

pyroomacoustics.doa.tools_fri_doa_plane.dirac_recon_ri_multiband_inner(c_ri_half, a_ri, num_bands, rhs, rhs_bl, K, M, D1, D2, D_coef, mtx_shrink, Tbeta_ri, G, GtG, max_iter)

Inner loop of the dirac_recon_ri_multiband function

pyroomacoustics.doa.tools_fri_doa_plane.extract_off_diag(mtx)

extract off diagonal entries in mtx. The output vector is order in a column major manner.

Parameters

mtx – input matrix to extract the off diagonal entries

pyroomacoustics.doa.tools_fri_doa_plane.hermitian_expan(half_vec_len)

expand a real-valued vector to a Hermitian symmetric vector. The input vector is a concatenation of the real parts with NON-POSITIVE indices and the imaginary parts with STRICTLY-NEGATIVE indices.

Parameters

half_vec_len – length of the first half vector

pyroomacoustics.doa.tools_fri_doa_plane.lu_compute_mtx_obj(Tbeta_lst, num_bands, K, lu_R_GtGinv_Rt_lst)
compute the matrix (M) in the objective function:

min c^H M c s.t. c0^H c = 1

Parameters
  • GtG_lst – list of G^H * G

  • Tbeta_lst – list of Teoplitz matrices for beta-s

  • Rc0 – right dual matrix for the annihilating filter (same of each block -> not a list)

pyroomacoustics.doa.tools_fri_doa_plane.lu_compute_mtx_obj_initial(GtG_inv_lst, Tbeta_lst, Rc0, num_bands, K)
compute the matrix (M) in the objective function:

min c^H M c s.t. c0^H c = 1

Parameters
  • GtG_lst – list of G^H * G

  • Tbeta_lst – list of Teoplitz matrices for beta-s

  • Rc0 – right dual matrix for the annihilating filter (same of each block -> not a list)

pyroomacoustics.doa.tools_fri_doa_plane.make_G(p_mic_x, p_mic_y, omega_bands, sound_speed, M, signal_type='visibility')

reconstruct point sources on the circle from the visibility measurements from multi-bands.

Parameters
  • p_mic_x – a vector that contains microphones’ x-coordinates

  • p_mic_y – a vector that contains microphones’ y-coordinates

  • omega_bands – mid-band (ANGULAR) frequencies [radian/sec]

  • sound_speed – speed of sound

  • signal_type – The type of the signal a, possible values are ‘visibility’ for covariance matrix and ‘raw’ for microphone inputs

Return type

The list of mapping matrices from measurements to sinusoids

pyroomacoustics.doa.tools_fri_doa_plane.make_GtG_and_inv(G_lst)
pyroomacoustics.doa.tools_fri_doa_plane.mtx_freq2raw(M, p_mic_x, p_mic_y)

build the matrix that maps the Fourier series to the raw microphone signals

Parameters
  • M – the Fourier series expansion is limited from -M to M

  • p_mic_x – a vector that contains microphones x coordinates

  • p_mic_y – a vector that contains microphones y coordinates

pyroomacoustics.doa.tools_fri_doa_plane.mtx_freq2visi(M, p_mic_x, p_mic_y)

build the matrix that maps the Fourier series to the visibility

Parameters
  • M – the Fourier series expansion is limited from -M to M

  • p_mic_x – a vector that constains microphones x coordinates

  • p_mic_y – a vector that constains microphones y coordinates

pyroomacoustics.doa.tools_fri_doa_plane.mtx_fri2signal_ri(M, p_mic_x, p_mic_y, D1, D2, signal='visibility')

build the matrix that maps the Fourier series to the visibility in terms of REAL-VALUED entries only. (matrix size double)

Parameters
  • M – the Fourier series expansion is limited from -M to M

  • p_mic_x – a vector that contains microphones x coordinates

  • p_mic_y – a vector that contains microphones y coordinates

  • D1 – expansion matrix for the real-part

  • D2 – expansion matrix for the imaginary-part

  • signal – The type of signal considered (‘visibility’ for covariance matrix, ‘raw’ for microphone inputs)

pyroomacoustics.doa.tools_fri_doa_plane.mtx_fri2signal_ri_multiband(M, p_mic_x_all, p_mic_y_all, D1, D2, aslist=False, signal='visibility')

build the matrix that maps the Fourier series to the visibility in terms of REAL-VALUED entries only. (matrix size double)

Parameters
  • M – the Fourier series expansion is limited from -M to M

  • p_mic_x_all – a matrix that contains microphones x coordinates

  • p_mic_y_all – a matrix that contains microphones y coordinates

  • D1 – expansion matrix for the real-part

  • D2 – expansion matrix for the imaginary-part aslist: whether the linear mapping for each subband is returned as a list or a block diagonal matrix

  • signal – The type of signal considered (‘visibility’ for covariance matrix, ‘raw’ for microphone inputs)

pyroomacoustics.doa.tools_fri_doa_plane.mtx_updated_G(phi_recon, M, mtx_amp2visi_ri, mtx_fri2visi_ri)

Update the linear transformation matrix that links the FRI sequence to the visibilities by using the reconstructed Dirac locations. :param phi_recon: the reconstructed Dirac locations (azimuths) :param M: the Fourier series expansion is between -M to M :param p_mic_x: a vector that contains microphones’ x-coordinates :param p_mic_y: a vector that contains microphones’ y-coordinates :param mtx_freq2visi: the linear mapping from Fourier series to visibilities

pyroomacoustics.doa.tools_fri_doa_plane.mtx_updated_G_multiband(phi_recon, M, mtx_amp2visi_ri, mtx_fri2visi_ri, num_bands)

Update the linear transformation matrix that links the FRI sequence to the visibilities by using the reconstructed Dirac locations.

Parameters
  • phi_recon – the reconstructed Dirac locations (azimuths)

  • M – the Fourier series expansion is between -M to M

  • p_mic_x – a vector that contains microphones’ x-coordinates

  • p_mic_y – a vector that contains microphones’ y-coordinates

  • mtx_fri2visi – the linear mapping from Fourier series to visibilities

pyroomacoustics.doa.tools_fri_doa_plane.mtx_updated_G_multiband_new(phi_opt, M, p_x, p_y, G0_lst, num_bands)

Update the linear transformation matrix that links the FRI sequence to the visibilities by using the reconstructed Dirac locations.

Parameters
  • phi_opt – the reconstructed Dirac locations (azimuths)

  • M – the Fourier series expansion is between -M to M

  • p_mic_x – a vector that contains microphones’ x-coordinates

  • p_mic_y – a vector that contains microphones’ y-coordinates

  • G0_lst – the original linear mapping from Fourier series to visibilities

  • num_bands – number of subbands

pyroomacoustics.doa.tools_fri_doa_plane.multiband_cov_mtx_est(y_mic)

estimate covariance matrix based on the received signals at microphones

Parameters

y_mic – received signal (complex base-band representation) at microphones

pyroomacoustics.doa.tools_fri_doa_plane.multiband_extract_off_diag(mtx)

extract off-diagonal entries in mtx The output vector is order in a column major manner

Parameters

mtx (input matrix to extract the off-diagonal entries) –

pyroomacoustics.doa.tools_fri_doa_plane.output_shrink(K, L)

shrink the convolution output to half the size. used when both the annihilating filter and the uniform samples of sinusoids satisfy Hermitian symmetric.

Parameters
  • K – the annihilating filter size: K + 1

  • L – length of the (complex-valued) b vector

pyroomacoustics.doa.tools_fri_doa_plane.polar2cart(rho, phi)

convert from polar to cartesian coordinates

Parameters
  • rho – radius

  • phi – azimuth

pyroomacoustics.doa.tools_fri_doa_plane.pt_src_recon(a, p_mic_x, p_mic_y, omega_band, sound_speed, K, M, noise_level, max_ini=50, stop_cri='mse', update_G=False, verbose=False, signal_type='visibility', **kwargs)

reconstruct point sources on the circle from the visibility measurements

Parameters
  • a – the measured visibilities

  • p_mic_x – a vector that contains microphones’ x-coordinates

  • p_mic_y – a vector that contains microphones’ y-coordinates

  • omega_band – mid-band (ANGULAR) frequency [radian/sec]

  • sound_speed – speed of sound

  • K – number of point sources

  • M – the Fourier series expansion is between -M to M

  • noise_level – noise level in the measured visibilities

  • max_ini – maximum number of random initialisation used

  • stop_cri – either ‘mse’ or ‘max_iter’

  • update_G – update the linear mapping that links the uniformly sampled sinusoids to the visibility or not.

  • verbose – whether output intermediate results for debugging or not

  • signal_type – The type of the signal a, possible values are ‘visibility’ for covariance matrix and ‘raw’ for microphone inputs

  • kwargs (possible optional input: G_iter: number of iterations for the G updates) –

pyroomacoustics.doa.tools_fri_doa_plane.pt_src_recon_multiband(a, p_mic_x, p_mic_y, omega_bands, sound_speed, K, M, noise_level, max_ini=50, update_G=False, verbose=False, signal_type='visibility', max_iter=50, G_lst=None, GtG_lst=None, GtG_inv_lst=None, **kwargs)

reconstruct point sources on the circle from the visibility measurements from multi-bands.

Parameters
  • a – the measured visibilities in a matrix form, where the second dimension corresponds to different subbands

  • p_mic_x – a vector that contains microphones’ x-coordinates

  • p_mic_y – a vector that contains microphones’ y-coordinates

  • omega_bands – mid-band (ANGULAR) frequencies [radian/sec]

  • sound_speed – speed of sound

  • K – number of point sources

  • M – the Fourier series expansion is between -M to M

  • noise_level – noise level in the measured visibilities

  • max_ini – maximum number of random initialisation used

  • update_G – update the linear mapping that links the uniformly sampled sinusoids to the visibility or not.

  • verbose – whether output intermediate results for debugging or not

  • signal_type – The type of the signal a, possible values are ‘visibility’ for covariance matrix and ‘raw’ for microphone inputs

  • kwargs – possible optional input: G_iter: number of iterations for the G updates

pyroomacoustics.doa.tools_fri_doa_plane.pt_src_recon_rotate(a, p_mic_x, p_mic_y, K, M, noise_level, max_ini=50, stop_cri='mse', update_G=False, num_rotation=1, verbose=False, signal_type='visibility', **kwargs)

reconstruct point sources on the circle from the visibility measurements. Here we apply random rotations to the coordiantes.

Parameters
  • a – the measured visibilities

  • p_mic_x – a vector that contains microphones’ x-coordinates

  • p_mic_y – a vector that contains microphones’ y-coordinates

  • K – number of point sources

  • M – the Fourier series expansion is between -M to M

  • noise_level – noise level in the measured visibilities

  • max_ini – maximum number of random initialisation used

  • stop_cri – either ‘mse’ or ‘max_iter’

  • update_G – update the linear mapping that links the uniformly sampled sinusoids to the visibility or not.

  • num_rotation – number of random rotations

  • verbose – whether output intermediate results for debugging or not

  • signal_type – The type of the signal a, possible values are ‘visibility’ for covariance matrix and ‘raw’ for microphone inputs

  • kwargs – possible optional input: G_iter: number of iterations for the G updates

Grid Objects

Routines to perform grid search on the sphere

class pyroomacoustics.doa.grid.Grid(n_points)

Bases: object

This is an abstract class with attributes and methods for grids

Parameters

n_points (int) – the number of points on the grid

abstract apply(func, spherical=False)
abstract find_peaks(k=1)
set_values(vals)
class pyroomacoustics.doa.grid.GridCircle(n_points=360, azimuth=None)

Bases: Grid

Creates a grid on the circle.

Parameters
  • n_points (int, optional) – The number of uniformly spaced points in the grid.

  • azimuth (ndarray, optional) – An array of azimuth (in radians) to use for grid locations. Overrides n_points.

apply(func, spherical=False)
find_peaks(k=1)
plot(mark_peaks=0)
class pyroomacoustics.doa.grid.GridSphere(n_points=1000, spherical_points=None)

Bases: Grid

This function computes nearly equidistant points on the sphere using the fibonacci method

Parameters
  • n_points (int) – The number of points to sample

  • spherical_points (ndarray, optional) – A 2 x n_points array of spherical coordinates with azimuth in the top row and colatitude in the second row. Overrides n_points.

References

http://lgdv.cs.fau.de/uploads/publications/spherical_fibonacci_mapping.pdf http://stackoverflow.com/questions/9600801/evenly-distributing-n-points-on-a-sphere

apply(func, spherical=False)

Apply a function to every grid point

find_peaks(k=1)

Find the largest peaks on the grid

min_max_distance()

Compute some statistics on the distribution of the points

plot(colatitude_ref=None, azimuth_ref=None, colatitude_recon=None, azimuth_recon=None, plotly=True, projection=True, points_only=False)
plot_old(plot_points=False, mark_peaks=0)

Plot the points on the sphere with their values

regrid()

Regrid the non-uniform data on a regular mesh

Plot Helpers

A collection of functions to plot maps and points on circles and spheres.

pyroomacoustics.doa.plotters.polar_plt_dirac(self, azimuth_ref=None, alpha_ref=None, save_fig=False, file_name=None, plt_dirty_img=True)

Generate polar plot of DoA results.

Parameters
  • azimuth_ref (numpy array) – True direction of sources (in radians).

  • alpha_ref (numpy array) – Estimated amplitude of sources.

  • save_fig (bool) – Whether or not to save figure as pdf.

  • file_name (str) – Name of file (if saved). Default is ‘polar_recon_dirac.pdf’

  • plt_dirty_img (bool) – Whether or not to plot spatial spectrum or ‘dirty image’ in the case of FRI.

pyroomacoustics.doa.plotters.sph_plot_diracs(colatitude_ref=None, azimuth_ref=None, colatitude=None, azimuth=None, dirty_img=None, colatitude_grid=None, azimuth_grid=None, file_name='sph_recon_2d_dirac.pdf', **kwargs)

This function plots the dirty image with sources locations on a flat projection of the sphere

Parameters
  • colatitude_ref (ndarray, optional) – The colatitudes of a collection of reference points

  • azimuths_ref (ndarray, optional) – The azimuths of a collection of reference points for the Diracs

  • colatitude (ndarray, optional) – The colatitudes of the collection of points to visualize

  • azimuth (ndarray, optional) – The azimuths of the collection of points to visualize

  • dirty_img (ndarray) – A 2D map for displaying a pattern on the sphere under the points

  • azimuth_grid (ndarray) – The azimuths indexing the dirty_img 2D map

  • colatitude_grid (ndarray) – The colatitudes indexing the dirty_img 2D map

pyroomacoustics.doa.plotters.sph_plot_diracs_plotly(colatitude_ref=None, azimuth_ref=None, colatitude=None, azimuth=None, dirty_img=None, azimuth_grid=None, colatitude_grid=None, surface_base=1, surface_height=0.0)

Plots a 2D map on a sphere as well as a collection of diracs using the plotly library

Parameters
  • colatitude_ref (ndarray, optional) – The colatitudes of a collection of reference points

  • azimuths_ref (ndarray, optional) – The azimuths of a collection of reference points for the Diracs

  • colatitude (ndarray, optional) – The colatitudes of the collection of points to visualize

  • azimuth (ndarray, optional) – The azimuths of the collection of points to visualize

  • dirty_img (ndarray) – A 2D map for displaying a pattern on the sphere under the points

  • azimuth_grid (ndarray) – The azimuths indexing the dirty_img 2D map

  • colatitude_grid (ndarray) – The colatitudes indexing the dirty_img 2D map

  • surface_base – radius corresponding to lowest height on the map

  • sufrace_height – radius difference between the lowest and highest point on the map

Peak Detection

Detect peaks in data based on their amplitude and other features.

Author: Marcos Duarte, https://github.com/demotu/BMC Version: 1.0.4 License: MIT

pyroomacoustics.doa.detect_peaks.detect_peaks(x, mph=None, mpd=1, threshold=0, edge='rising', kpsh=False, valley=False, show=False, ax=None)

Detect peaks in data based on their amplitude and other features.

Parameters
  • x (1D array_like) – data.

  • mph ({None, number}, optional (default = None)) – detect peaks that are greater than minimum peak height.

  • mpd (positive integer, optional (default = 1)) – detect peaks that are at least separated by minimum peak distance (in number of data).

  • threshold (positive number, optional (default = 0)) – detect peaks (valleys) that are greater (smaller) than threshold in relation to their immediate neighbors.

  • edge ({None, 'rising', 'falling', 'both'}, optional (default = 'rising')) – for a flat peak, keep only the rising edge (‘rising’), only the falling edge (‘falling’), both edges (‘both’), or don’t detect a flat peak (None).

  • kpsh (bool, optional (default = False)) – keep peaks with same height even if they are closer than mpd.

  • valley (bool, optional (default = False)) – if True (1), detect valleys (local minima) instead of peaks.

  • show (bool, optional (default = False)) – if True (1), plot data in matplotlib figure.

  • ax (a matplotlib.axes.Axes instance, optional (default = None).) –

Returns

ind – indeces of the peaks in x.

Return type

1D array_like

Notes

The detection of valleys instead of peaks is performed internally by simply negating the data: ind_valleys = detect_peaks(-x)

The function can handle NaN’s

See this IPython Notebook 1.

References

1

http://nbviewer.ipython.org/github/demotu/BMC/blob/master/notebooks/DetectPeaks.ipynb

Examples

>>> from detect_peaks import detect_peaks
>>> x = np.random.randn(100)
>>> x[60:81] = np.nan
>>> # detect all peaks and plot data
>>> ind = detect_peaks(x, show=True)
>>> print(ind)
>>> x = np.sin(2*np.pi*5*np.linspace(0, 1, 200)) + np.random.randn(200)/5
>>> # set minimum peak height = 0 and minimum peak distance = 20
>>> detect_peaks(x, mph=0, mpd=20, show=True)
>>> x = [0, 1, 0, 2, 0, 3, 0, 2, 0, 1, 0]
>>> # set minimum peak distance = 2
>>> detect_peaks(x, mpd=2, show=True)
>>> x = np.sin(2*np.pi*5*np.linspace(0, 1, 200)) + np.random.randn(200)/5
>>> # detection of valleys instead of peaks
>>> detect_peaks(x, mph=0, mpd=20, valley=True, show=True)
>>> x = [0, 1, 1, 0, 1, 1, 0]
>>> # detect both edges
>>> detect_peaks(x, edge='both', show=True)
>>> x = [-2, 1, -2, 2, 1, 1, 3, 0]
>>> # set threshold = 2
>>> detect_peaks(x, threshold = 2, show=True)

DOA Utilities

This module contains useful functions to compute distances and errors on on circles and spheres.

pyroomacoustics.doa.utils.cart2spher(vectors)
Parameters

vectors (array_like, shape (3, n_vectors)) – The vectors to transform

Returns

  • azimuth (numpy.ndarray, shape (n_vectors,)) – The azimuth of the vectors

  • colatitude (numpy.ndarray, shape (n_vectors,)) – The colatitude of the vectors

  • r (numpy.ndarray, shape (n_vectors,)) – The length of the vectors

pyroomacoustics.doa.utils.circ_dist(azimuth1, azimuth2, r=1.0)

Returns the shortest distance between two points on a circle

Parameters
  • azimuth1 – azimuth of point 1

  • azimuth2 – azimuth of point 2

  • r (optional) – radius of the circle (Default 1)

pyroomacoustics.doa.utils.great_circ_dist(r, colatitude1, azimuth1, colatitude2, azimuth2)

calculate great circle distance for points located on a sphere

Parameters
  • r (radius of the sphere) –

  • colatitude1 (colatitude of point 1) –

  • azimuth1 (azimuth of point 1) –

  • colatitude2 (colatitude of point 2) –

  • azimuth2 (azimuth of point 2) –

Returns

great-circle distance

Return type

float or ndarray

pyroomacoustics.doa.utils.polar_distance(x1, x2)

Given two arrays of numbers x1 and x2, pairs the cells that are the closest and provides the pairing matrix index: x1(index(1,:)) should be as close as possible to x2(index(2,:)). The function outputs the average of the absolute value of the differences abs(x1(index(1,:))-x2(index(2,:))).

Parameters
  • x1 – vector 1

  • x2 – vector 2

Returns

  • d – minimum distance between d

  • index – the permutation matrix

pyroomacoustics.doa.utils.spher2cart(azimuth, colatitude=None, r=1, degrees=False)

Convert a spherical point to cartesian coordinates.

Parameters
  • azimuth – azimuth

  • colatitude – colatitude

  • r – radius

  • degrees – Returns values in degrees instead of radians if set to True

Returns

An ndarray containing the Cartesian coordinates of the points as its columns.

Return type

ndarray

Single Channel Denoising

Module contents

Single Channel Noise Reduction

Collection of single channel noise reduction (SCNR) algorithms for speech:

At this repository, a deep learning approach in Python can be found.

References

1

M. Berouti, R. Schwartz, and J. Makhoul, Enhancement of speech corrupted by acoustic noise, ICASSP ‘79. IEEE International Conference on Acoustics, Speech, and Signal Processing, 1979, pp. 208-211.

2

Y. Ephraim and H. L. Van Trees, A signal subspace approach for speech enhancement, IEEE Transactions on Speech and Audio Processing, vol. 3, no. 4, pp. 251-266, Jul 1995.

3

J. Lim and A. Oppenheim, All-Pole Modeling of Degraded Speech, IEEE Transactions on Acoustics, Speech, and Signal Processing 26.3 (1978): 197-210.

Algorithms

pyroomacoustics.denoise.spectral_subtraction module

class pyroomacoustics.denoise.spectral_subtraction.SpectralSub(nfft, db_reduc, lookback, beta, alpha=1)

Bases: object

Here we have a class for performing single channel noise reduction via spectral subtraction. The instantaneous signal energy and noise floor is estimated at each time instance (for each frequency bin) and this is used to compute a gain filter with which to perform spectral subtraction.

For a given frame n, the gain for frequency bin k is given by:

\[\begin{split}G[k, n] = \max \\left \{ \\left ( \dfrac{P[k, n]-\\beta P_N[k, n]}{P[k, n]} \\right )^\\alpha, G_{min} \\right \},\end{split}\]

where \(G_{min} = 10^{-(db\_reduc/20)}\) and \(db\_reduc\) is the maximum reduction (in dB) that we are willing to perform for each bin (a high value can actually be detrimental, see below). The instantaneous energy \(P[k,n]\) is computed by simply squaring the frequency amplitude at the bin k. The time-frequency decomposition of the input signal is typically done with the STFT and overlapping frames. The noise estimate \(P_N[k, n]\) for frequency bin k is given by looking back a certain number of frames \(L\) and selecting the bin with the lowest energy:

\[P_N[k, n] = \min_{[n-L, n]} P[k, n]\]

This approach works best when the SNR is positive and the noise is rather stationary. An alternative approach for the noise estimate (also in the case of stationary noise) would be to apply a lowpass filter for each frequency bin.

With a large suppression, i.e. large values for \(db\_reduc\), we can observe a typical artefact of such spectral subtraction approaches, namely “musical noise”. Here is nice article about noise reduction and musical noise.

Adjusting the constants \(\\beta\) and \(\\alpha\) also presents a trade-off between suppression and undesirable artefacts, i.e. more noticeable musical noise.

Below is an example of how to use this class to emulate a streaming/online input. A full example can be found here.

# initialize STFT and SpectralSub objects
nfft = 512
stft = pra.transform.STFT(nfft, hop=nfft//2,
                          analysis_window=pra.hann(nfft))
scnr = pra.denoise.SpectralSub(nfft, db_reduc=10, lookback=5,
                               beta=20, alpha=3)

# apply block-by-block
for n in range(num_blocks):

    # go to frequency domain for noise reduction
    stft.analysis(mono_noisy)
    gain_filt = scnr.compute_gain_filter(stft.X)

    # estimating input convolved with unknown response
    mono_denoised = stft.synthesis(gain_filt*stft.X)

There also exists a “one-shot” function.

# import or create `noisy_signal`
denoised_signal = apply_spectral_sub(noisy_signal, nfft=512,
                                     db_reduc=10, lookback=5,
                                     beta=20, alpha=3)
Parameters
  • nfft (int) – FFT size. Length of gain filter, i.e. the number of frequency bins, is given by nfft//2+1.

  • db_reduc (float) – Maximum reduction in dB for each bin.

  • lookback (int) – How many frames to look back for the noise estimate.

  • beta (float) – Overestimation factor to “push” the gain filter value (at each frequency) closer to the dB reduction specified by db_reduc.

  • alpha (float, optional) – Exponent factor to modify transition behavior towards the dB reduction specified by db_reduc. Default is 1.

compute_gain_filter(X)
Parameters

X (numpy array) – Complex spectrum of length nfft//2+1.

Returns

Gain filter to multiply given spectrum with.

Return type

numpy array

pyroomacoustics.denoise.spectral_subtraction.apply_spectral_sub(noisy_signal, nfft=512, db_reduc=25, lookback=12, beta=30, alpha=1)

One-shot function to apply spectral subtraction approach.

Parameters
  • noisy_signal (numpy array) – Real signal in time domain.

  • nfft (int) – FFT size. Length of gain filter, i.e. the number of frequency bins, is given by nfft//2+1.

  • db_reduc (float) – Maximum reduction in dB for each bin.

  • lookback (int) – How many frames to look back for the noise estimate.

  • beta (float) – Overestimation factor to “push” the gain filter value (at each frequency) closer to the dB reduction specified by db_reduc.

  • alpha (float, optional) – Exponent factor to modify transition behavior towards the dB reduction specified by db_reduc. Default is 1.

Returns

Enhanced/denoised signal.

Return type

numpy array

pyroomacoustics.denoise.subspace module

class pyroomacoustics.denoise.subspace.Subspace(frame_len=256, mu=10, lookback=10, skip=2, thresh=0.01, data_type='float32')

Bases: object

A class for performing single channel noise reduction in the time domain via the subspace approach. This implementation is based off of the approach presented in:

Y. Ephraim and H. L. Van Trees, A signal subspace approach for speech enhancement, IEEE Transactions on Speech and Audio Processing, vol. 3, no. 4, pp. 251-266, Jul 1995.

Moreover, an adaptation of the subspace approach is implemented here, as presented in:

Y. Hu and P. C. Loizou, A subspace approach for enhancing speech corrupted by colored noise, IEEE Signal Processing Letters, vol. 9, no. 7, pp. 204-206, Jul 2002.

Namely, an eigendecomposition is performed on the matrix:

\[\Sigma = R_n^{-1} R_y - I,\]

where \(R_n\) is the noise covariance matrix, \(R_y\) is the covariance matrix of the input noisy signal, and \(I\) is the identity matrix. The covariance matrices are estimated from past samples; the number of past samples/frames used for estimation can be set with the parameters lookback and skip. A simple energy threshold (thresh parameter) is used to identify noisy frames.

The eigenvectors corresponding to the positive eigenvalues of \(\Sigma\) are used to create a linear operator \(H_{opt}\) in order to enhance the noisy input signal \(\mathbf{y}\):

\[\mathbf{\hat{x}} = H_{opt} \cdot \mathbf{y}.\]

The length of \(\mathbf{y}\) is specified by the parameter frame_len; \(50\%\) overlap and add with a Hanning window is used to reconstruct the output. A great summary of the approach can be found in the paper by Y. Hu and P. C. Loizou under Section III, B.

Adjusting the factor \(\\mu\) presents a trade-off between noise suppression and distortion.

Below is an example of how to use this class to emulate a streaming/online input. A full example can be found here. Depending on your choice for frame_len, lookback, and skip, the approach may not be suitable for real-time processing.

# initialize Subspace object
scnr = Subspace(frame_len=256, mu=10, lookback=10, skip=2, thresh=0.01)

# apply block-by-block
for n in range(num_blocks):
    denoised_signal = scnr.apply(noisy_input)

There also exists a “one-shot” function.

# import or create `noisy_signal`
denoised_signal = apply_subspace(noisy_signal, frame_len=256, mu=10,
                                 lookback=10, skip=2, thresh=0.01)
Parameters
  • frame_len (int) – Frame length in samples. Note that large values (above 256) will make the eigendecompositions very expensive.

  • mu (float) – Enhancement factor, larger values suppress more noise but could lead to more distortion.

  • lookback (int) – How many frames to look back for covariance matrix estimation.

  • skip (int) – How many samples to skip when estimating the covariance matrices with past samples. skip=1 will use all possible frames in the estimation.

  • thresh (float) – Threshold to distinguish between (signal+noise) and (noise) frames. A high value will classify more frames as noise but might remove desired signal!

  • data_type ('float32' or 'float64') – Data type to use in the enhancement procedure. Default is ‘float32’.

apply(new_samples)
Parameters

new_samples (numpy array) – New array of samples of length self.hop in the time domain.

Returns

Denoised samples.

Return type

numpy array

compute_signal_projection()
update_cov_matrices(new_samples)
pyroomacoustics.denoise.subspace.apply_subspace(noisy_signal, frame_len=256, mu=10, lookback=10, skip=2, thresh=0.01, data_type=<MagicMock id='139898608615120'>)

One-shot function to apply subspace denoising approach.

Parameters
  • noisy_signal (numpy array) – Real signal in time domain.

  • frame_len (int) – Frame length in samples. Note that large values (above 256) will make the eigendecompositions very expensive. 50% overlap is used with hanning window.

  • mu (float) – Enhancement factor, larger values suppress more noise but could lead to more distortion.

  • lookback (int) – How many frames to look back for covariance matrix estimation.

  • skip (int) – How many samples to skip when estimating the covariance matrices with past samples. skip=1 will use all possible frames in the estimation.

  • thresh (float) – Threshold to distinguish between (signal+noise) and (noise) frames. A high value will classify more frames as noise and might even remove desired signal!

  • data_type ('float32' or 'float64') – Data type to use in the enhancement procedure. Default is ‘float32’.

Returns

Enhanced/denoised signal.

Return type

numpy array

pyroomacoustics.denoise.iterative_wiener module

class pyroomacoustics.denoise.iterative_wiener.IterativeWiener(frame_len, lpc_order, iterations, alpha=0.8, thresh=0.01)

Bases: object

A class for performing single channel noise reduction in the frequency domain with a Wiener filter that is iteratively computed. This implementation is based off of the approach presented in:

J. Lim and A. Oppenheim, All-Pole Modeling of Degraded Speech, IEEE Transactions on Acoustics, Speech, and Signal Processing 26.3 (1978): 197-210.

For each frame, a Wiener filter of the following form is computed and applied to the noisy samples in the frequency domain:

\[\begin{split}H(\\omega) = \dfrac{P_S(\\omega)}{P_S(\\omega) + \\sigma_d^2},\end{split}\]

where \(P_S(\omega)\) is the speech power spectral density and \(\sigma_d^2\) is the noise variance.

The following assumptions are made in order to arrive at the above filter as the optimal solution:

  • The noisy samples \(y[n]\) can be written as:

    \[y[n] = s[n] + d[n],\]

    where \(s[n]\) is the desired signal and \(d[n]\) is the background noise.

  • The signal and noise are uncorrelated.

  • The noise is white Gaussian, i.e. it has a flat power spectrum with amplitude \(\sigma_d^2\).

Under these assumptions, the above Wiener filter minimizes the mean-square error between the true samples \(s[0:N-1]\) and the estimated one \(\hat{s[0:N-1]}\) by filtering \(y[0:N-1]\) with the above filter (with \(N\) being the frame length).

The fundamental part of this approach is correctly (or as well as possible) estimating the speech power spectral density \(P_S(\omega)\) and the noise variance \(\sigma_d^2\). For this, we need a voice activity detector in order to determine when we have incoming speech. In this implementation, we use a simple energy threshold on the input frame, which is set with the thresh input parameter.

When no speech is identified, the input frame is used to update the noise variance \(\sigma_d^2\). We could simply set \(\sigma_d^2\) to the energy of the input frame. However, we employ a simple IIR filter in order to avoid abrupt changes in the noise level (thus adding an assumption of stationary):

\[\begin{split}\sigma_d^2[k] = \\alpha \cdot \sigma_d^2[k-1] + (1-\\alpha) \cdot \sigma_y^2,\end{split}\]

where \(\\alpha\) is the smoothing parameter and \(\sigma_y^2\) is the energy of the input frame. A high value of \(\\alpha\) will update the noise level very slowly, while a low value will make it very sensitive to changes at the input. The value for \(\\alpha\) can be set with the alpha parameter.

When speech is identified in the input frame, an iterative procedure is employed in order to estimate \(P_S(\omega)\) (and therefore the Wiener filter \(H\) as well). This procedure consists of computing \(p\) linear predictive coding (LPC) coefficients of the input frame. The number of LPC coefficients is set with the parameter lpc_order. These LPC coefficients form an all-pole filter that models the vocal tract as described in the above paper (Eq. 1). With these coefficients, we can then obtain an estimate of the speech power spectral density (Eq. 41b) and thus the corresponding Wiener filter (Eq. 41a). This Wiener filter is used to denoise the input frame. Moreover, with this denoised frame, we can compute new LPC coefficients and therefore a new Wiener filter. The idea behind this approach is that by iteratively computing the LPC coefficients as such, we can obtain a better estimate of the speech power spectral density. The number of iterations can be set with the iterations parameter.

Below is an example of how to use this class to emulate a streaming/online input. A full example can be found here.

# initialize STFT and IterativeWiener objects
nfft = 512
stft = pra.transform.STFT(nfft, hop=nfft//2,
                          analysis_window=pra.hann(nfft))
scnr = IterativeWiener(frame_len=nfft, lpc_order=20, iterations=2,
                       alpha=0.8, thresh=0.01)

# apply block-by-block
for n in range(num_blocks):

    # go to frequency domain, 50% overlap
    stft.analysis(mono_noisy)

    # compute wiener output
    X = scnr.compute_filtered_output(
            current_frame=stft.fft_in_buffer,
            frame_dft=stft.X)

    # back to time domain
    mono_denoised = stft.synthesis(X)

There also exists a “one-shot” function.

# import or create `noisy_signal`
denoised_signal = apply_iterative_wiener(noisy_signal, frame_len=512,
                                         lpc_order=20, iterations=2,
                                         alpha=0.8, thresh=0.01)
Parameters
  • frame_len (int) – Frame length in samples.

  • lpc_order (int) – Number of LPC coefficients to compute

  • iterations (int) – How many iterations to perform in updating the Wiener filter for each signal frame.

  • alpha (int) – Smoothing factor within [0,1] for updating noise level. Closer to 1 gives more weight to the previous noise level, while closer to 0 gives more weight to the current frame’s level. Closer to 0 can track more rapid changes in the noise level. However, if a speech frame is incorrectly identified as noise, you can end up removing desired speech.

  • thresh (float) – Threshold to distinguish between (signal+noise) and (noise) frames. A high value will classify more frames as noise but might remove desired signal!

compute_filtered_output(current_frame, frame_dft=None)

Compute Wiener filter in the frequency domain.

Parameters
  • current_frame (numpy array) – Noisy samples.

  • frame_dft (numpy array) – DFT of input samples. If not provided, it will be computed.

Returns

Output of denoising in the frequency domain.

Return type

numpy array

pyroomacoustics.denoise.iterative_wiener.apply_iterative_wiener(noisy_signal, frame_len=512, lpc_order=20, iterations=2, alpha=0.8, thresh=0.01)

One-shot function to apply iterative Wiener filtering for denoising.

Parameters
  • noisy_signal (numpy array) – Real signal in time domain.

  • frame_len (int) – Frame length in samples. 50% overlap is used with hanning window.

  • lpc_order (int) – Number of LPC coefficients to compute

  • iterations (int) – How many iterations to perform in updating the Wiener filter for each signal frame.

  • alpha (int) – Smoothing factor within [0,1] for updating noise level. Closer to 1 gives more weight to the previous noise level, while closer to 0 gives more weight to the current frame’s level. Closer to 0 can track more rapid changes in the noise level. However, if a speech frame is incorrectly identified as noise, you can end up removing desired speech.

  • thresh (float) – Threshold to distinguish between (signal+noise) and (noise) frames. A high value will classify more frames as noise but might remove desired signal!

Returns

Enhanced/denoised signal.

Return type

numpy array

pyroomacoustics.denoise.iterative_wiener.compute_speech_psd(a, g2, nfft)

Compute power spectral density of speech as specified in equation (41b) of

J. Lim and A. Oppenheim, All-Pole Modeling of Degraded Speech, IEEE Transactions on Acoustics, Speech, and Signal Processing 26.3 (1978): 197-210.

Namely:

\[\begin{split}P_S(\\omega) = \dfrac{g^2}{\\left \| 1 - \sum_{k=1}^p a_k \cdot e^{-jk\omega} \\right \|^2},\end{split}\]

where \(p\) is the LPC order, \(a_k\) are the LPC coefficients, and \(g\) is an estimated gain factor.

The power spectral density is computed at the frequencies corresponding to a DFT of length nfft.

Parameters
  • a (numpy array) – LPC coefficients.

  • g2 (float) – Squared gain.

  • nfft (int) – FFT length.

Returns

Power spectral density from LPC coefficients.

Return type

numpy array

pyroomacoustics.denoise.iterative_wiener.compute_squared_gain(a, noise_psd, y)

Estimate the squared gain of the speech power spectral density as done on p. 204 of

J. Lim and A. Oppenheim, All-Pole Modeling of Degraded Speech, IEEE Transactions on Acoustics, Speech, and Signal Processing 26.3 (1978): 197-210.

Namely solving for \(g^2\) such that the following expression is satisfied:

\[\begin{split}\dfrac{N}{2\pi} \int_{-\pi}^{\pi} \dfrac{g^2}{\\left \| 1 - \sum_{k=1}^p a_k \cdot e^{-jk\omega} \\right \|^2} d\omega = \sum_{n=0}^{N-1} y^2(n) - N\cdot\sigma_d^2,\end{split}\]

where \(N\) is the number of noisy samples \(y\), \(a_k\) are the \(p\) LPC coefficients, and \(\sigma_d^2\) is the noise variance.

Parameters
  • a (numpy array) – LPC coefficients.

  • noise_psd (float or numpy array) – Noise variance if white noise, numpy array otherwise.

  • y (numpy array) – Noisy time domain samples.

Returns

Squared gain.

Return type

float

pyroomacoustics.denoise.iterative_wiener.compute_wiener_filter(speech_psd, noise_psd)

Compute Wiener filter in the frequency domain.

Parameters
  • speech_psd (numpy array) – Speech power spectral density.

  • noise_psd (float or numpy array) – Noise variance if white noise, numpy array otherwise.

Returns

Frequency domain filter, computed at the same frequency values as speech_psd.

Return type

numpy array

Phase Processing

Submodules

Griffin-Lim Phase Reconstruction

Implementation of the classic phase reconstruction from Griffin and Lim 1. The input to the algorithm is the magnitude from STFT measurements.

The algorithm works by starting by assigning a (possibly random) initial phase to the measurements, and then iteratively

  1. Reconstruct the time-domain signal

  2. Re-apply STFT

  3. Enforce the known magnitude of the measurements

The implementation supports different types of initialization via the keyword argument ini.

  1. If omitted, the initial phase is uniformly zero

  2. If ini="random", a random phase is used

  3. If ini=A for a numpy.ndarray of the same shape as the input magnitude, A / numpy.abs(A) is used for initialization

Example

import numpy as np
from scipy.io import wavfile
import pyroomacoustics as pra

# We open a speech sample
filename = "examples/input_samples/cmu_arctic_us_axb_a0004.wav"
fs, audio = wavfile.read(filename)

# These are the parameters of the STFT
fft_size = 512
hop = fft_size // 4
win_a = np.hamming(fft_size)
win_s = pra.transform.stft.compute_synthesis_window(win_a, hop)
n_iter = 200

engine = pra.transform.STFT(
    fft_size, hop=hop, analysis_window=win_a, synthesis_window=win_s
)
X = engine.analysis(audio)
X_mag = np.abs(X)
X_mag_norm = np.linalg.norm(X_mag) ** 2

# monitor convergence
errors = []

# the callback to track the spectral distance convergence
def cb(epoch, Y, y):
    # we measure convergence via spectral distance
    Y_2 = engine.analysis(y)
    sd = np.linalg.norm(X_mag - np.abs(Y_2)) ** 2 / X_mag_norm
    # save in the list every 10 iterations
    if epoch % 10 == 0:
        errors.append(sd)

pra.phase.griffin_lim(X_mag, hop, win_a, n_iter=n_iter, callback=cb)

plt.semilogy(np.arange(len(errors)) * 10, errors)
plt.show()

References

1

D. Griffin and J. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, no. 2, pp. 236–243, 1984.

pyroomacoustics.phase.gl.griffin_lim(X, hop, analysis_window, fft_size=None, stft_kwargs={}, n_iter=100, ini=None, callback=None)

Implementation of the Griffin-Lim phase reconstruction algorithm from STFT magnitude measurements.

Parameters
  • X (array_like, shape (n_frames, n_freq)) – The STFT magnitude measurements

  • hop (int) – The frame shift of the STFT

  • analysis_window (array_like, shape (fft_size,)) – The window used for the STFT analysis

  • fft_size (int, optional) – The FFT size for the STFT, if omitted it is computed from the dimension of X

  • stft_kwargs (dict, optional) – Dictionary of extra parameters for the STFT

  • n_iter (int, optional) – The number of iteration

  • ini (str or array_like, np.complex, shape (n_frames, n_freq), optional) – The initial value of the phase estimate. If “random”, uses a random guess. If None, uses 0 phase.

  • callback (func, optional) – A callable taking as argument an int and the reconstructed STFT and time-domain signals

Module contents

Phase Processing

This sub-package contains algorithms related to phase specific process, such as phase reconstruction.

Griffin-Lim
Phase reconstruction by fixed-point iterations 1

References

1

D. Griffin and J. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 32, no. 2, pp. 236–243, 1984.

Pyroomacoustics API

Subpackages

pyroomacoustics.experimental package

Submodules
pyroomacoustics.experimental.deconvolution module
pyroomacoustics.experimental.deconvolution.deconvolve(y, s, length=None, thresh=0.0)

Deconvolve an excitation signal from an impulse response

Parameters
  • y (ndarray) – The recording

  • s (ndarray) – The excitation signal

  • length (int, optional) – the length of the impulse response to deconvolve

  • thresh (float, optional) – ignore frequency bins with power lower than this

pyroomacoustics.experimental.deconvolution.wiener_deconvolve(y, x, length=None, noise_variance=1.0, let_n_points=15, let_div_base=2)

Deconvolve an excitation signal from an impulse response

We use Wiener filter

Parameters
  • y (ndarray) – The recording

  • x (ndarray) – The excitation signal

  • length (int, optional) – the length of the impulse response to deconvolve

  • noise_variance (float, optional) – estimate of the noise variance

  • let_n_points (int) – number of points to use in the LET approximation

  • let_div_base (float) – the divider used for the LET grid

pyroomacoustics.experimental.delay_calibration module
class pyroomacoustics.experimental.delay_calibration.DelayCalibration(fs, pad_time=0.0, mls_bits=16, repeat=1, temperature=25.0, humidity=50.0, pressure=1000.0)

Bases: object

run(distance=0.0, ch_in=None, ch_out=None, oversampling=1)

Run the calibration. Plays a maximum length sequence and cross correlate the signals to find the time delay.

Parameters
  • distance (float, optional) – Distance between the speaker and microphone

  • ch_in (int, optional) – The input channel to use. If not specified, all channels are calibrated

  • ch_out (int, optional) – The output channel to use. If not specified, all channels are calibrated

pyroomacoustics.experimental.localization module

We have a number of points of know locations and have the TDOA measurements from an unknown location to the known point. We perform an EDM line search to find the unknown offset to turn TDOA to TOA.

Parameters
  • R (ndarray) – An ndarray of 3xN where each column is the location of a point

  • tdoa (ndarray) – A length N vector containing the tdoa measurements from uknown location to known ones

  • bounds (ndarray) – Bounds for the line search

  • step (float) – Step size for the line search

pyroomacoustics.experimental.localization.tdoa(x1, x2, interp=1, fs=1, phat=True)

This function computes the time difference of arrival (TDOA) of the signal at the two microphones. This in turns is used to infer the direction of arrival (DOA) of the signal.

Specifically if s(k) is the signal at the reference microphone and s_2(k) at the second microphone, then for signal arriving with DOA theta we have

s_2(k) = s(k - tau)

with

tau = fs*d*sin(theta)/c

where d is the distance between the two microphones and c the speed of sound.

We recover tau using the Generalized Cross Correlation - Phase Transform (GCC-PHAT) method. The reference is

Knapp, C., & Carter, G. C. (1976). The generalized correlation method for estimation of time delay.

Parameters
  • x1 (nd-array) – The signal of the reference microphone

  • x2 (nd-array) – The signal of the second microphone

  • interp (int, optional (default 1)) – The interpolation value for the cross-correlation, it can improve the time resolution (and hence DOA resolution)

  • fs (int, optional (default 44100 Hz)) – The sampling frequency of the input signal

Returns

  • theta (float) – the angle of arrival (in radian (I think))

  • pwr (float) – the magnitude of the maximum cross correlation coefficient

  • delay (float) – the delay between the two microphones (in seconds)

pyroomacoustics.experimental.localization.tdoa_loc(R, tdoa, c, x0=None)

TDOA based localization

Parameters
  • R (ndarray) – A 3xN array of 3D points

  • tdoa (ndarray) – A length N array of tdoa

  • c (float) – The speed of sound

  • Reference

  • ---------

  • Li (Steven) –

  • localization (TDOA) –

pyroomacoustics.experimental.measure_ir module
pyroomacoustics.experimental.measure_ir.measure_ir(sweep_length=1.0, sweep_type='exponential', fs=48000, f_lo=0.0, f_hi=None, volume=0.9, pre_delay=0.0, post_delay=0.1, fade_in_out=0.0, dev_in=None, dev_out=None, channels_input_mapping=None, channels_output_mapping=None, ascending=False, deconvolution=True, plot=True)

Measures an impulse response by playing a sweep and recording it using the sounddevice package.

Parameters
  • sweep_length (float, optional) – length of the sweep in seconds

  • sweep_type (SweepType, optional) – type of sweep to use linear or exponential (default)

  • fs (int, optional) – sampling frequency (default 48 kHz)

  • f_lo (float, optional) – lowest frequency in the sweep

  • f_hi (float, optional) – highest frequency in the sweep, can be a negative offset from fs/2

  • volume (float, optional) – multiply the sweep by this number before playing (default 0.9)

  • pre_delay (float, optional) – delay in second before playing sweep

  • post_delay (float, optional) – delay in second before stopping recording after playing the sweep

  • fade_in_out (float, optional) – length in seconds of the fade in and out of the sweep (default 0.)

  • dev_in (int, optional) – input device number

  • dev_out (int, optional) – output device number

  • channels_input_mapping (array_like, optional) – List of channel numbers (starting with 1) to record. If mapping is given, channels is silently ignored.

  • channels_output_mapping (array_like, optional) – List of channel numbers (starting with 1) where the columns of data shall be played back on. Must have the same length as number of channels in data (except if data is mono, in which case the signal is played back on all given output channels). Each channel number may only appear once in mapping.

  • ascending (bool, optional) – wether the sweep is from high to low (default) or low to high frequencies

  • deconvolution (bool, optional) – if True, apply deconvolution to the recorded signal to remove the sweep (default 0.)

  • plot (bool, optional) – plot the resulting signal

Return type

Returns the impulse response if deconvolution == True and the recorded signal if not

pyroomacoustics.experimental.physics module
pyroomacoustics.experimental.physics.calculate_speed_of_sound(t, h, p)

Compute the speed of sound as a function of temperature, humidity and pressure

Parameters
  • t (temperature [Celsius]) –

  • h (relative humidity [%]) –

  • p (atmospheric pressure [kpa]) –

Return type

Speed of sound in [m/s]

pyroomacoustics.experimental.point_cloud module
Point Clouds

Contains PointCloud class.

Given a number of points and their relative distances, this class aims at reconstructing their relative coordinates.

class pyroomacoustics.experimental.point_cloud.PointCloud(m=1, dim=3, diameter=0.0, X=None, labels=None, EDM=None)

Bases: object

EDM()

Computes the EDM corresponding to the marker set

align(marker, axis)

Rotate the marker set around the given axis until it is aligned onto the given marker

Parameters
  • marker (int or str) – the index or label of the marker onto which to align the set

  • axis (int) – the axis around which the rotation happens

center(marker)

Translate the marker set so that the argument is the origin.

classical_mds(D)

Classical multidimensional scaling

Parameters

D (square 2D ndarray) – Euclidean Distance Matrix (matrix containing squared distances between points

copy()

Return a deep copy of this marker set object

correct(corr_dic)

correct a marker location by a given vector

doa(receiver, source)

Computes the direction of arrival wrt a source and receiver

flatten(ind)

Transform the set of points so that the subset of markers given as argument is as close as flat (wrt z-axis) as possible.

Parameters

ind (list of bools) – Lists of marker indices that should be all in the same subspace

fromEDM(D, labels=None, method='mds')

Compute the position of markers from their Euclidean Distance Matrix

Parameters
  • D (square 2D ndarray) – Euclidean Distance Matrix (matrix containing squared distances between points

  • labels (list, optional) – A list of human friendly labels for the markers (e.g. ‘east’, ‘west’, etc)

  • method (str, optional) – The method to use * ‘mds’ for multidimensional scaling (default) * ‘tri’ for trilateration

key2ind(ref)

Get the index location from a label

normalize(refs=None)

Reposition points such that x0 is at origin, x1 lies on c-axis and x2 lies above x-axis, keeping the relative position to each other. The z-axis is defined according to right hand rule by default.

Parameters
  • refs (list of 3 ints or str) – The index or label of three markers used to define (origin, x-axis, y-axis)

  • left_hand (bool, optional (default False)) – Normally the z-axis is defined using right-hand rule, this flag allows to override this behavior

plot(axes=None, show_labels=True, **kwargs)
trilateration(D)

Find the location of points based on their distance matrix using trilateration

Parameters

D (square 2D ndarray) – Euclidean Distance Matrix (matrix containing squared distances between points

trilateration_single_point(c, Dx, Dy)

Given x at origin (0,0) and y at (0,c) the distances from a point at unknown location Dx, Dy to x, y, respectively, finds the position of the point.

RT60 Measurement Routine
RT60 Measurement Routine

Automatically determines the reverberation time of an impulse response using the Schroeder method 1.

References

1

M. R. Schroeder, “New Method of Measuring Reverberation Time,” J. Acoust. Soc. Am., vol. 37, no. 3, pp. 409-412, Mar. 1968.

pyroomacoustics.experimental.rt60.measure_rt60(h, fs=1, decay_db=60, plot=False, rt60_tgt=None)

Analyze the RT60 of an impulse response. Optionaly plots some useful information.

Parameters
  • h (array_like) – The impulse response.

  • fs (float or int, optional) – The sampling frequency of h (default to 1, i.e., samples).

  • decay_db (float or int, optional) – The decay in decibels for which we actually estimate the time. Although we would like to estimate the RT60, it might not be practical. Instead, we measure the RT20 or RT30 and extrapolate to RT60.

  • plot (bool, optional) – If set to True, the power decay and different estimated values will be plotted (default False).

  • rt60_tgt (float) – This parameter can be used to indicate a target RT60 to which we want to compare the estimated value.

pyroomacoustics.experimental.signals module

A few test signals like sweeps and stuff.

pyroomacoustics.experimental.signals.exponential_sweep(T, fs, f_lo=0.0, f_hi=None, fade=None, ascending=False)

Exponential sine sweep

Parameters
  • T (float) – length in seconds

  • fs – sampling frequency

  • f_lo (float) – lowest frequency in fraction of fs (default 0)

  • f_hi (float) – lowest frequency in fraction of fs (default 1)

  • fade (float, optional) – length of fade in and out in seconds (default 0)

  • ascending (bool, optional) –

pyroomacoustics.experimental.signals.linear_sweep(T, fs, f_lo=0.0, f_hi=None, fade=None, ascending=False)

Linear sine sweep

Parameters
  • T (float) – length in seconds

  • fs – sampling frequency

  • f_lo (float) – lowest frequency in fraction of fs (default 0)

  • f_hi (float) – lowest frequency in fraction of fs (default 1)

  • fade (float, optional) – length of fade in and out in seconds (default 0)

  • ascending (bool, optional) –

pyroomacoustics.experimental.signals.window(signal, n_win)

window the signal at beginning and end with window of size n_win/2

Module contents
Experimental

A bunch of routines useful when doing measurements and experiments.

pyroomacoustics.experimental.measure_ir(sweep_length=1.0, sweep_type='exponential', fs=48000, f_lo=0.0, f_hi=None, volume=0.9, pre_delay=0.0, post_delay=0.1, fade_in_out=0.0, dev_in=None, dev_out=None, channels_input_mapping=None, channels_output_mapping=None, ascending=False, deconvolution=True, plot=True)

Measures an impulse response by playing a sweep and recording it using the sounddevice package.

Parameters
  • sweep_length (float, optional) – length of the sweep in seconds

  • sweep_type (SweepType, optional) – type of sweep to use linear or exponential (default)

  • fs (int, optional) – sampling frequency (default 48 kHz)

  • f_lo (float, optional) – lowest frequency in the sweep

  • f_hi (float, optional) – highest frequency in the sweep, can be a negative offset from fs/2

  • volume (float, optional) – multiply the sweep by this number before playing (default 0.9)

  • pre_delay (float, optional) – delay in second before playing sweep

  • post_delay (float, optional) – delay in second before stopping recording after playing the sweep

  • fade_in_out (float, optional) – length in seconds of the fade in and out of the sweep (default 0.)

  • dev_in (int, optional) – input device number

  • dev_out (int, optional) – output device number

  • channels_input_mapping (array_like, optional) – List of channel numbers (starting with 1) to record. If mapping is given, channels is silently ignored.

  • channels_output_mapping (array_like, optional) – List of channel numbers (starting with 1) where the columns of data shall be played back on. Must have the same length as number of channels in data (except if data is mono, in which case the signal is played back on all given output channels). Each channel number may only appear once in mapping.

  • ascending (bool, optional) – wether the sweep is from high to low (default) or low to high frequencies

  • deconvolution (bool, optional) – if True, apply deconvolution to the recorded signal to remove the sweep (default 0.)

  • plot (bool, optional) – plot the resulting signal

Return type

Returns the impulse response if deconvolution == True and the recorded signal if not

Submodules

pyroomacoustics.acoustics module

class pyroomacoustics.acoustics.OctaveBandsFactory(base_frequency=125.0, fs=16000, n_fft=512)

Bases: object

A class to process uniformly all properties that are defined on octave bands.

Each property is stored for an octave band.

base_freq

The center frequency of the first octave band

Type

float

fs

The target sampling frequency

Type

float

n_bands

The number of octave bands needed to cover from base_freq to fs / 2 (i.e. floor(log2(fs / base_freq)))

Type

int

bands

The list of bin boundaries for the octave bands

Type

list of tuple

centers

The list of band centers

all_materials

The list of all Material objects created by the factory

Type

list of Material

Parameters
  • base_frequency (float, optional) – The center frequency of the first octave band (default: 125 Hz)

  • fs (float, optional) – The sampling frequency used (default: 16000 Hz)

  • third_octave (bool, optional) – Use third octave bands if True (default: False)

analysis(x, band=None)

Process a signal x through the filter bank

Parameters

x (ndarray (n_samples)) – The input signal

Returns

The input signal filters through all the bands

Return type

ndarray (n_samples, n_bands)

get_bw()

Returns the bandwidth of the bands

pyroomacoustics.acoustics.bandpass_filterbank(bands, fs=1.0, order=8, output='sos')

Create a bank of Butterworth bandpass filters

Parameters
  • bands (array_like, shape == (n, 2)) – The list of bands [[flo1, fup1], [flo2, fup2], ...]

  • fs (float, optional) – Sampling frequency (default 1.)

  • order (int, optional) – The order of the IIR filters (default: 8)

  • output ({'ba', 'zpk', 'sos'}) – Type of output: numerator/denominator (‘ba’), pole-zero (‘zpk’), or second-order sections (‘sos’). Default is ‘ba’.

Returns

  • b, a (ndarray, ndarray) – Numerator (b) and denominator (a) polynomials of the IIR filter. Only returned if output=’ba’.

  • z, p, k (ndarray, ndarray, float) – Zeros, poles, and system gain of the IIR filter transfer function. Only returned if output=’zpk’.

  • sos (ndarray) – Second-order sections representation of the IIR filter. Only returned if output==’sos’.

pyroomacoustics.acoustics.bands_hz2s(bands_hz, Fs, N, transform='dft')

Converts bands given in Hertz to samples with respect to a given sampling frequency Fs and a transform size N an optional transform type is used to handle DCT case.

pyroomacoustics.acoustics.binning(S, bands)

This function computes the sum of all columns of S in the subbands enumerated in bands

pyroomacoustics.acoustics.critical_bands()

Compute the Critical bands as defined in the book: Psychoacoustics by Zwicker and Fastl. Table 6.1 p. 159

pyroomacoustics.acoustics.inverse_sabine(rt60, room_dim, c=None)

Given the desired reverberation time (RT60, i.e. the time for the energy to drop by 60 dB), the dimensions of a rectangular room (shoebox), and sound speed, computes the energy absorption coefficient and maximum image source order needed. The speed of sound used is the package wide default (in constants).

Parameters
  • rt60 (float) – desired RT60 (time it takes to go from full amplitude to 60 db decay) in seconds

  • room_dim (list of floats) – list of length 2 or 3 of the room side lengths

  • c (float) – speed of sound

Returns

  • absorption (float) – the energy absorption coefficient to be passed to room constructor

  • max_order (int) – the maximum image source order necessary to achieve the desired RT60

pyroomacoustics.acoustics.invmelscale(b)

Converts from melscale to frequency in Hertz according to Huang-Acero-Hon (6.143)

pyroomacoustics.acoustics.melfilterbank(M, N, fs=1, fl=0.0, fh=0.5)

Returns a filter bank of triangular filters spaced according to mel scale

We follow Huang-Acera-Hon 6.5.2

Parameters
  • M ((int)) – The number of filters in the bank

  • N ((int)) – The length of the DFT

  • fs ((float) optional) – The sampling frequency (default 8000)

  • fl ((float)) – Lowest frequency in filter bank as a fraction of fs (default 0.)

  • fh ((float)) – Highest frequency in filter bank as a fraction of fs (default 0.5)

Return type

An M times int(N/2)+1 ndarray that contains one filter per row

pyroomacoustics.acoustics.melscale(f)

Converts f (in Hertz) to the melscale defined according to Huang-Acero-Hon (2.6)

pyroomacoustics.acoustics.mfcc(x, L=128, hop=64, M=14, fs=8000, fl=0.0, fh=0.5)

Computes the Mel-Frequency Cepstrum Coefficients (MFCC) according to the description by Huang-Acera-Hon 6.5.2 (2001) The MFCC are features mimicing the human perception usually used for some learning task.

This function will first split the signal into frames, overlapping or not, and then compute the MFCC for each frame.

Parameters
  • x ((nd-array)) – Input signal

  • L ((int)) – Frame size (default 128)

  • hop ((int)) – Number of samples to skip between two frames (default 64)

  • M ((int)) – Number of mel-frequency filters (default 14)

  • fs ((int)) – Sampling frequency (default 8000)

  • fl ((float)) – Lowest frequency in filter bank as a fraction of fs (default 0.)

  • fh ((float)) – Highest frequency in filter bank as a fraction of fs (default 0.5)

Return type

The MFCC of the input signal

pyroomacoustics.acoustics.octave_bands(fc=1000, third=False, start=0.0, n=8)

Create a bank of octave bands

Parameters
  • fc (float, optional) – The center frequency

  • third (bool, optional) – Use third octave bands (default False)

  • start (float, optional) – Starting frequency for octave bands in Hz (default 0.)

  • n (int, optional) – Number of frequency bands (default 8)

pyroomacoustics.acoustics.rt60_eyring(S, V, a, m, c)

This is the Eyring formula for estimation of the reverberation time.

Parameters
  • S – the total surface of the room walls in m^2

  • V – the volume of the room in m^3

  • a (float) – the equivalent absorption coefficient sum(a_w * S_w) / S where a_w and S_w are the absorption and surface of wall w, respectively.

  • m (float) – attenuation constant of air

  • c (float) – speed of sound in m/s

Returns

The estimated reverberation time (RT60)

Return type

float

pyroomacoustics.acoustics.rt60_sabine(S, V, a, m, c)

This is the Eyring formula for estimation of the reverberation time.

Parameters
  • S – the total surface of the room walls in m^2

  • V – the volume of the room in m^3

  • a (float) – the equivalent absorption coefficient sum(a_w * S_w) / S where a_w and S_w are the absorption and surface of wall w, respectively.

  • m (float) – attenuation constant of air

  • c (float) – speed of sound in m/s

Returns

The estimated reverberation time (RT60)

Return type

float

pyroomacoustics.beamforming module

class pyroomacoustics.beamforming.Beamformer(R, fs, N=1024, Lg=None, hop=None, zpf=0, zpb=0)

Bases: MicrophoneArray

At some point, in some nice way, the design methods should also go here. Probably with generic arguments.

Parameters
  • R (numpy.ndarray) – Mics positions

  • fs (int) – Sampling frequency

  • N (int, optional) – Length of FFT, i.e. number of FD beamforming weights, equally spaced. Defaults to 1024.

  • Lg (int, optional) – Length of time-domain filters. Default to N.

  • hop (int, optional) – Hop length for frequency domain processing. Default to N/2.

  • zpf (int, optional) – Front zero padding length for frequency domain processing. Default is 0.

  • zpb (int, optional) – Zero padding length for frequency domain processing. Default is 0.

far_field_weights(phi)

This method computes weight for a far field at infinity phi: direction of beam

filters_from_weights(non_causal=0.0)

Compute time-domain filters from frequency domain weights. :param non_causal: ratio of filter coefficients used for non-causal part :type non_causal: float, optional

plot(sum_ir=False, FD=True)
plot_beam_response()
plot_response_from_point(x, legend=None)
process(FD=False)
rake_delay_and_sum_weights(source, interferer=None, R_n=None, attn=True, ff=False)
rake_distortionless_filters(source, interferer, R_n, delay=0.03, epsilon=0.005)

Compute time-domain filters of a beamformer minimizing noise and interference while forcing a distortionless response towards the source.

rake_max_sinr_filters(source, interferer, R_n, epsilon=0.005, delay=0.0)

Compute the time-domain filters of SINR maximizing beamformer.

rake_max_sinr_weights(source, interferer=None, R_n=None, rcond=0.0, ff=False, attn=True)

This method computes a beamformer focusing on a number of specific sources and ignoring a number of interferers.

Parameters
  • source (array_like) – source locations

  • interferer (array_like) – interferer locations

rake_max_udr_filters(source, interferer=None, R_n=None, delay=0.03, epsilon=0.005)

Compute directly the time-domain filters maximizing the Useful-to-Detrimental Ratio (UDR). This beamformer is not practical. It maximizes the UDR ratio in the time domain directly without imposing flat response towards the source of interest. This results in severe distortion of the desired signal.

Parameters
  • source (pyroomacoustics.SoundSource) – the desired source

  • interferer (pyroomacoustics.SoundSource, optional) – the interfering source

  • R_n (ndarray, optional) – the noise covariance matrix, it should be (M * Lg)x(M * Lg) where M is the number of sensors and Lg the filter length

  • delay (float, optional) – the signal delay introduced by the beamformer (default 0.03 s)

  • epsilon (float) –

rake_max_udr_weights(source, interferer=None, R_n=None, ff=False, attn=True)
rake_mvdr_filters(source, interferer, R_n, delay=0.03, epsilon=0.005)

Compute the time-domain filters of the minimum variance distortionless response beamformer.

rake_one_forcing_filters(sources, interferers, R_n, epsilon=0.005)

Compute the time-domain filters of a beamformer with unit response towards multiple sources.

rake_one_forcing_weights(source, interferer=None, R_n=None, ff=False, attn=True)
rake_perceptual_filters(source, interferer=None, R_n=None, delay=0.03, d_relax=0.035, epsilon=0.005)

Compute directly the time-domain filters for a perceptually motivated beamformer. The beamformer minimizes noise and interference, but relaxes the response of the filter within the 30 ms following the delay.

response(phi_list, frequency)
response_from_point(x, frequency)
snr(source, interferer, f, R_n=None, dB=False)
steering_vector_2D(frequency, phi, dist, attn=False)
steering_vector_2D_from_point(frequency, source, attn=True, ff=False)

Creates a steering vector for a particular frequency and source :param frequency: :param source: location in cartesian coordinates :param attn: include attenuation factor if True :param ff: uses far-field distance if true

Returns

A 2x1 ndarray containing the steering vector.

udr(source, interferer, f, R_n=None, dB=False)
weights_from_filters()
pyroomacoustics.beamforming.H(A, **kwargs)

Returns the conjugate (Hermitian) transpose of a matrix.

class pyroomacoustics.beamforming.MicrophoneArray(R, fs, directivity=None)

Bases: object

Microphone array class.

property M
append(locs)

Add some microphones to the array.

Parameters

locs (numpy.ndarray (2 or 3, n_mics)) – Adds n_mics microphones to the array. The coordinates are passed as a numpy.ndarray with each column containing the coordinates of a microphone.

record(signals, fs)

This simulates the recording of the signals by the microphones. In particular, if the microphones and the room simulation do not use the same sampling frequency, down/up-sampling is done here. :param signals: An ndarray with as many lines as there are microphones. :param fs: the sampling frequency of the signals.

set_directivity(directivities)

This functions sets self.directivity as a list of directivities with n_mics entries, where n_mics is the number of microphones :param directivities: single directivity for all microphones or a list of directivities for each microphone

to_wav(filename, mono=False, norm=False, bitdepth=<class 'float'>)

Save all the signals to wav files.

Parameters
  • filename (str) – the name of the file

  • mono (bool, optional) – if true, records only the center channel floor(M / 2) (default False)

  • norm (bool, optional) – if true, normalize the signal to fit in the dynamic range (default False)

  • bitdepth (int, optional) – the format of output samples [np.int8/16/32/64 or np.float (default)]

pyroomacoustics.beamforming.circular_2D_array(center, M, phi0, radius)

Create an array of uniformly spaced circular points in 2D.

Parameters
  • center (array_like) – The center of the array

  • M (int) – The number of points

  • phi0 (float) – The counterclockwise rotation of the first element in the array (from the x-axis)

  • radius (float) – The radius of the array

Returns

The array of points

Return type

ndarray (2, M)

pyroomacoustics.beamforming.circular_microphone_array_xyplane(center, M, phi0, radius, fs, directivity=None, ax=None)

Create a microphone array with directivities pointing outwards (if provided).

Parameters
  • center (array_like) – The center of the microphone array. 2D or 3D.

  • M (int) – The number of microphones.

  • phi0 (float) – The counterclockwise rotation (in degrees) of the first element in the microphone array (from the x-axis).

  • radius (float) – The radius of the microphone array.

  • fs (int) – The sampling frequency.

  • directivity (Directivity object, optional.) – Directivity pattern for each microphone which will be re-oriented to face outward. If not provided, microphones are omnidirectional.

  • ax (axes object, optional) – Axes on which to plot microphone array with its directivities.

Return type

MicrophoneArray object

pyroomacoustics.beamforming.distance(x, y)

Computes the distance matrix E. E[i,j] = sqrt(sum((x[:,i]-y[:,j])**2)). x and y are DxN ndarray containing N D-dimensional vectors.

pyroomacoustics.beamforming.fir_approximation_ls(weights, T, n1, n2)
pyroomacoustics.beamforming.linear_2D_array(center, M, phi, d)

Creates an array of uniformly spaced linear points in 2D :param center: The center of the array :type center: array_like :param M: The number of points :type M: int :param phi: The counterclockwise rotation of the array (from the x-axis) :type phi: float :param d: The distance between neighboring points :type d: float

Returns

The array of points

Return type

ndarray (2, M)

pyroomacoustics.beamforming.mdot(*args)

Left-to-right associative matrix multiplication of multiple 2D ndarrays.

pyroomacoustics.beamforming.poisson_2D_array(center, M, d)

Create array of 2D positions drawn from Poisson process. :param center: The center of the array :type center: array_like :param M: The number of points in the first dimension :type M: int :param M: The number of points in the second dimension :type M: int :param phi: The counterclockwise rotation of the array (from the x-axis) :type phi: float :param d: The distance between neighboring points :type d: float

Returns

The array of points

Return type

ndarray (2, M * N)

pyroomacoustics.beamforming.spiral_2D_array(center, M, radius=1.0, divi=3, angle=None)

Generate an array of points placed on a spiral :param center: location of the center of the array :type center: array_like :param M: number of microphones :type M: int :param radius: microphones are contained within a cirle of this radius (default 1) :type radius: float :param divi: number of rotations of the spiral (default 3) :type divi: int :param angle: the angle offset of the spiral (default random) :type angle: float

Returns

The array of points

Return type

ndarray (2, M * N)

pyroomacoustics.beamforming.square_2D_array(center, M, N, phi, d)

Creates an array of uniformly spaced grid points in 2D :param center: The center of the array :type center: array_like :param M: The number of points in the first dimension :type M: int :param N: The number of points in the second dimension :type N: int :param phi: The counterclockwise rotation of the array (from the x-axis) :type phi: float :param d: The distance between neighboring points :type d: float

Returns

The array of points

Return type

ndarray (2, M * N)

pyroomacoustics.beamforming.sumcols(A)

Sums the columns of a matrix (np.array). The output is a 2D np.array of dimensions M x 1.

pyroomacoustics.beamforming.unit_vec2D(phi)

pyroomacoustics.build_rir module

pyroomacoustics.directivities module

class pyroomacoustics.directivities.CardioidFamily(orientation, pattern_enum, gain=1.0)

Bases: Directivity

Object for directivities coming from the cardioid family.

Parameters
property directivity_pattern

Name of cardioid directivity pattern.

get_response(azimuth, colatitude=None, magnitude=False, frequency=None, degrees=True)

Get response for provided angles.

Parameters
  • azimuth (array_like) – Azimuth in degrees

  • colatitude (array_like, optional) – Colatitude in degrees. Default is to be on XY plane.

  • magnitude (bool, optional) – Whether to return magnitude of response.

  • frequency (float, optional) – For which frequency to compute the response. Cardioid are frequency-independent so this value has no effect.

  • degrees (bool, optional) – Whether provided angles are in degrees.

Returns

resp – Response at provided angles.

Return type

ndarray

plot_response(**kwargs)
class pyroomacoustics.directivities.DirectionVector(azimuth, colatitude=None, degrees=True)

Bases: object

Object for representing direction vectors in 3D, parameterized by an azimuth and colatitude angle.

Parameters
  • azimuth (float) –

  • colatitude (float, optional) – Default to PI / 2, only XY plane.

  • degrees (bool) – Whether provided values are in degrees (True) or radians (False).

get_azimuth(degrees=False)
get_colatitude(degrees=False)
property unit_vector

Direction vector in cartesian coordinates.

class pyroomacoustics.directivities.Directivity(orientation)

Bases: ABC

Abstract class for directivity patterns.

get_azimuth(degrees=True)
get_colatitude(degrees=True)
abstract get_response(azimuth, colatitude=None, magnitude=False, frequency=None, degrees=True)

Get response for provided angles and frequency.

set_orientation(orientation)

Set orientation of directivity pattern.

Parameters

orientation (DirectionVector) – New direction for the directivity pattern.

class pyroomacoustics.directivities.DirectivityPattern(value)

Bases: Enum

Common cardioid patterns and their corresponding coefficient for the expression:

\[r = a (1 + \cos \theta),\]

where \(a\) is the coefficient that determines the cardioid pattern and \(r\) yields the gain at angle \(\theta\).

CARDIOID = 0.5
FIGURE_EIGHT = 0
HYPERCARDIOID = 0.25
OMNI = 1.0
SUBCARDIOID = 0.75
pyroomacoustics.directivities.cardioid_func(x, direction, coef, gain=1.0, normalize=True, magnitude=False)

One-shot function for computing cardioid response.

Parameters
  • x (array_like, shape (..., n_dim)) – Cartesian coordinates

  • direction (array_like, shape (n_dim)) – Direction vector, should be normalized.

  • coef (float) – Parameter for the cardioid function.

  • gain (float) – The gain.

  • normalize (bool) – Whether to normalize coordinates and direction vector.

  • magnitude (bool) – Whether to return magnitude, default is False.

Returns

resp – Response at provided angles for the speficied cardioid function.

Return type

ndarray

pyroomacoustics.directivities.source_angle_shoebox(image_source_loc, wall_flips, mic_loc)

Determine outgoing angle for each image source for a ShoeBox configuration.

Implementation of the method described in the paper: https://www2.ak.tu-berlin.de/~akgroup/ak_pub/2018/000458.pdf

Parameters
  • image_source_loc (array_like) – Locations of image sources.

  • wall_flips (array_like) – Number of x, y, z flips for each image source.

  • mic_loc (array_like) – Microphone location.

Returns

  • azimuth (ndarray) – Azimith for each image source, in radians

  • colatitude (ndarray) – Colatitude for each image source, in radians.

pyroomacoustics.metrics module

pyroomacoustics.metrics.itakura_saito(x1, x2, sigma2_n, stft_L=128, stft_hop=128)
pyroomacoustics.metrics.median(x, alpha=None, axis=-1, keepdims=False)

Computes 95% confidence interval for the median.

Parameters
  • x (array_like) – the data array

  • alpha (float, optional) – the confidence level of the interval, confidence intervals are only computed when this argument is provided

  • axis (int, optional) – the axis of the data on which to operate, by default the last axis

Returns

This function returns (m, [le, ue]) and the confidence interval is [m-le, m+ue].

Return type

tuple (float, [float, float])

pyroomacoustics.metrics.mse(x1, x2)

A short hand to compute the mean-squared error of two signals.

\[\begin{split}MSE = \\frac{1}{n}\sum_{i=0}^{n-1} (x_i - y_i)^2\end{split}\]
Parameters
  • x1 – (ndarray)

  • x2 – (ndarray)

Returns

(float) The mean of the squared differences of x1 and x2.

pyroomacoustics.metrics.pesq(ref_file, deg_files, Fs=8000, swap=False, wb=False, bin='./bin/pesq')

pesq_vals = pesq(ref_file, deg_files, sample_rate=None, bin=’./bin/pesq’): Computes the perceptual evaluation of speech quality (PESQ) metric of a degraded file with respect to a reference file. Uses the utility obtained from ITU P.862 http://www.itu.int/rec/T-REC-P.862-200511-I!Amd2/en

Parameters
  • ref_file – The filename of the reference file.

  • deg_files – A list of degraded sound files names.

  • sample_rate – Sample rates of the sound files [8kHz or 16kHz, default 8kHz].

  • swap – Swap byte orders (whatever that does is not clear to me) [default: False].

  • wb – Use wideband algorithm [default: False].

  • bin – Location of pesq executable [default: ./bin/pesq].

Returns

(ndarray size 2xN) ndarray containing Raw MOS and MOS LQO in rows 0 and 1, respectively, and has one column per degraded file name in deg_files.

pyroomacoustics.metrics.snr(ref, deg)
pyroomacoustics.metrics.sweeping_echo_measure(rir, fs, t_min=0, t_max=0.5, fb=400)

Measure of sweeping echo in RIR obtained from image-source method. A higher value indicates less sweeping echoes

For details see : De Sena et al. “On the modeling of rectangular geometries in room acoustic simulations”, IEEE TASLP, 2015

Parameters
  • rir (RIR signal from ISM (mono).) –

  • fs (sampling frequency.) –

  • t_min (TYPE, optional) – Minimum time window. The default is 0.

  • t_max (TYPE, optional) – Maximum time window. The default is 0.5.

  • fb (TYPE, optional) – Mask bandwidth. The default is 400.

Return type

sweeping spectrum flatness (ssf)

pyroomacoustics.multirate module

pyroomacoustics.multirate.frac_delay(delta, N, w_max=0.9, C=4)

Compute optimal fractionnal delay filter according to

Design of Fractional Delay Filters Using Convex Optimization William Putnam and Julius Smith

Parameters
  • delta – delay of filter in (fractionnal) samples

  • N – number of taps

  • w_max – Bandwidth of the filter (in fraction of pi) (default 0.9)

  • C – sets the number of constraints to C*N (default 4)

pyroomacoustics.multirate.low_pass(numtaps, B, epsilon=0.1)
pyroomacoustics.multirate.resample(x, p, q)

pyroomacoustics.parameters module

This file defines the main physical constants of the system:
  • Speed of sound

  • Absorption of materials

  • Scattering coefficients

  • Air absorption

class pyroomacoustics.parameters.Constants

Bases: object

A class to provide easy access package wide to user settable constants.

get(name)
set(name, val)
class pyroomacoustics.parameters.Material(energy_absorption, scattering=None)

Bases: object

A class that describes the energy absorption and scattering properties of walls.

energy_absorption

A dictionary containing keys description, coeffs, and center_freqs.

Type

dict

scattering

A dictionary containing keys description, coeffs, and center_freqs.

Type

dict

Parameters
  • energy_absorption (float, str, or dict) –

    • float: The material created will be equally absorbing at all frequencies

      (i.e. flat).

    • str: The absorption values will be obtained from the database.

    • dict: A dictionary containing keys description, coeffs, and

      center_freqs.

  • scattering (float, str, or dict) –

    • float: The material created will be equally scattering at all frequencies

      (i.e. flat).

    • str: The scattering values will be obtained from the database.

    • dict: A dictionary containing keys description, coeffs, and

      center_freqs.

property absorption_coeffs

shorthand to the energy absorption coefficients

classmethod all_flat(materials)

Checks if all materials in a list are frequency flat

Parameters

materials (list or dict of Material objects) – The list of materials to check

Return type

True if all materials have a single parameter, else False

is_freq_flat()

Returns True if the material has flat characteristics over frequency, False otherwise.

resample(octave_bands)

resample at given octave bands

property scattering_coeffs

shorthand to the scattering coefficients

class pyroomacoustics.parameters.Physics(temperature=None, humidity=None)

Bases: object

A Physics object allows to compute the room physical properties depending on temperature and humidity.

Parameters
  • temperature (float, optional) – The room temperature

  • humidity (float in range (0, 100), optional) – The room relative humidity in %. Default is 0.

classmethod from_speed(c)

Choose a temperature and humidity matching a desired speed of sound

get_air_absorption()
Returns

  • (air_absorption, center_freqs) where air_absorption is a list

  • corresponding to the center frequencies in center_freqs

get_sound_speed()
Return type

the speed of sound

pyroomacoustics.parameters.calculate_speed_of_sound(t, h, p)

Compute the speed of sound as a function of temperature, humidity and pressure

Parameters
  • t (float) – temperature [Celsius]

  • h (float) – relative humidity [%]

  • p (float) – atmospheric pressure [kpa]

Return type

Speed of sound in [m/s]

pyroomacoustics.parameters.get_num_threads()

Returns the number of threads available for pyroomacoustics

The number of threads can be set by 1. Setting the PRA_NUM_THREADS environment variable 2. Calling pra.constants.set("num_threads") = new_num_threads 3. Seting ``OMP_NUM_THREADS or MKL_NUM_THREADS

pyroomacoustics.parameters.make_materials(*args, **kwargs)

Helper method to conveniently create multiple materials.

Each positional and keyword argument should be a valid input for the Material class. Then, for each of the argument, a Material will be created by calling the constructor.

If at least one positional argument is provided, a list of Material objects constructed using the provided positional arguments is returned.

If at least one keyword argument is provided, a dict with keys corresponding to the keywords and containing Material objects constructed with the keyword values is returned.

If only positional arguments are provided, only the list is returned. If only keyword arguments are provided, only the dict is returned. If both are provided, both are returned. If no argument is provided, an empty list is returned.

 1# energy absorption parameters
 2floor_eabs = {
 3    "description": "Example floor material",
 4    "coeffs": [0.1, 0.2, 0.1, 0.1, 0.1, 0.05],
 5    "center_freqs": [125, 250, 500, 1000, 2000, 4000],
 6}
 7
 8# scattering parameters
 9audience_scat = {
10    "description": "Theatre Audience",
11    "coeffs": [0.3, 0.5, 0.6, 0.6, 0.7, 0.7, 0.7]
12    "center_freqs": [125, 250, 500, 1000, 2000, 4000],
13}
14
15# create a list of materials
16my_mat_list = pra.make_materials((floor_eabs, audience_scat))
17
18# create a dict of materials
19my_mat_dict = pra.make_materials(floor=(floor_abs, audience_scat))

pyroomacoustics.recognition module

class pyroomacoustics.recognition.CircularGaussianEmission(nstates, odim=1, examples=None)

Bases: object

get_pdfs()

Return the pdf of all the emission probabilities

prob_x_given_state(examples)

Recompute the probability of the observation given the state of the latent variables

update_parameters(examples, gamma)
class pyroomacoustics.recognition.GaussianEmission(nstates, odim=1, examples=None)

Bases: object

get_pdfs()

Return the pdf of all the emission probabilities

prob_x_given_state(examples)

Recompute the probability of the observation given the state of the latent variables

update_parameters(examples, gamma)
class pyroomacoustics.recognition.HMM(nstates, emission, model='full', leftright_jump_max=3)

Bases: object

Hidden Markov Model with Gaussian emissions

K

Number of states in the model

Type

int

O

Number of dimensions of the Gaussian emission distribution

Type

int

A

KxK transition matrix of the Markov chain

Type

ndarray

pi

K dim vector of the initial probabilities of the Markov chain

Type

ndarray

emission

An instance of emission_class

Type

(GaussianEmission or CircularGaussianEmission)

model

The model used for the chain, can be ‘full’ or ‘left-right’

Type

string, optional

leftright_jum_max

The number of non-zero upper diagonals in a ‘left-right’ model

Type

int, optional

backward(X, p_x_given_z, c)

The backward recursion for HMM as described in Bishop Ch. 13

fit(examples, tol=0.1, max_iter=10, verbose=False)

Training of the HMM using the EM algorithm

Parameters
  • examples ((list)) – A list of examples used to train the model. Each example is an array of feature vectors, each row is a feature vector, the sequence runs on axis 0

  • tol ((float)) – The training stops when the progress between to steps is less than this number (default 0.1)

  • max_iter ((int)) – Alternatively the algorithm stops when a maximum number of iterations is reached (default 10)

  • verbose (bool, optional) – When True, prints extra information about convergence

forward(X, p_x_given_z)

The forward recursion for HMM as described in Bishop Ch. 13

generate(N)

Generate a random sample of length N using the model

loglikelihood(X)

Compute the log-likelihood of a sample vector using the sum-product algorithm

update_parameters(examples, gamma, xhi)

Update the parameters of the Markov Chain

viterbi()

pyroomacoustics.soundsource module

class pyroomacoustics.soundsource.SoundSource(position, images=None, damping=None, generators=None, walls=None, orders=None, signal=None, delay=0, directivity=None)

Bases: object

A class to represent sound sources.

This object represents a sound source in a room by a list containing the original source position as well as all the image sources, up to some maximum order.

It also keeps track of the sequence of generated images and the index of the walls (in the original room) that generated the reflection.

add_signal(signal)

Sets signal attribute

Parameters

signal (ndarray) – a N-length ndarray, representing a sequence of samples generated by the source.

distance(ref_point)
get_damping(max_order=None)
get_images(max_order=None, max_distance=None, n_nearest=None, ref_point=None)

Keep this for compatibility Now replaced by the bracket operator and the setOrdering function.

set_directivity(directivity)

Sets self.directivity as a list of directivities with 1 entry

set_ordering(ordering, ref_point=None)

Set the order in which we retrieve images sources. Can be: ‘nearest’, ‘strongest’, ‘order’ Optional argument: ref_point

wall_sequence(i)

Print the wall sequence for the image source indexed by i

pyroomacoustics.soundsource.build_rir_matrix(mics, sources, Lg, Fs, epsilon=0.005, unit_damping=False)

A function to build the channel matrix for many sources and microphones

Parameters
  • mics (ndarray) – a dim-by-M ndarray where each column is the position of a microphone

  • sources (list of pyroomacoustics.SoundSource) – list of sound sources for which we want to build the matrix

  • Lg (int) – the length of the beamforming filters

  • Fs (int) – the sampling frequency

  • epsilon (float, optional) – minimum decay of the sinc before truncation. Defaults to epsilon=5e-3

  • unit_damping (bool, optional) – determines if the wall damping parameters are used or not. Default to false.

Returns

  • the function returns the RIR matrix H =

  • :: – ——————– | H_{11} H_{12} … | … | ——————–

  • where H_{ij} is channel matrix between microphone i and source j.

  • H is of type (M*Lg)x((Lg+Lh-1)*S) where Lh is the channel length (determined by epsilon),

  • and M, S are the number of microphones, sources, respectively.

pyroomacoustics.stft module

Class for real-time STFT analysis and processing.

pyroomacoustics.sync module

pyroomacoustics.sync.correlate(x1, x2, interp=1, phat=False)

Compute the cross-correlation between x1 and x2

Parameters
  • x1 (array_like) – The data arrays

  • x2 (array_like) – The data arrays

  • interp (int, optional) – The interpolation factor for the output array, default 1.

  • phat (bool, optional) – Apply the PHAT weighting (default False)

Return type

The cross-correlation between the two arrays

pyroomacoustics.sync.delay_estimation(x1, x2, L)

Estimate the delay between x1 and x2. L is the block length used for phat

pyroomacoustics.sync.tdoa(signal, reference, interp=1, phat=False, fs=1, t_max=None)

Estimates the shift of array signal with respect to reference using generalized cross-correlation

Parameters
  • signal (array_like) – The array whose tdoa is measured

  • reference (array_like) – The reference array

  • interp (int, optional) – The interpolation factor for the output array, default 1.

  • phat (bool, optional) – Apply the PHAT weighting (default False)

  • fs (int or float, optional) – The sampling frequency of the input arrays, default=1

Return type

The estimated delay between the two arrays

pyroomacoustics.sync.time_align(ref, deg, L=4096)

return a copy of deg time-aligned and of same-length as ref. L is the block length used for correlations.

pyroomacoustics.utilities module

pyroomacoustics.utilities.all_combinations(lst1, lst2)

Return all combinations between two arrays.

pyroomacoustics.utilities.angle_function(s1, v2)

Compute azimuth and colatitude angles in radians for a given set of points s1 and a singular point v2.

Parameters
  • s1 (numpy array) – 3×N for a set of N 3-D points, 2×N for a set of N 2-D points.

  • v2 (numpy array) – 3×1 for a 3-D point, 2×1 for a 2-D point.

Returns

2×N numpy array with azimuth and colatitude angles in radians.

Return type

numpy array

pyroomacoustics.utilities.autocorr(x, p, biased=True, method='numpy')

Compute the autocorrelation for real signal x up to lag p.

Parameters
  • x (numpy array) – Real signal in time domain.

  • p (int) – Amount of lag. When solving for LPC coefficient, this is typically the LPC order.

  • biased (bool) – Whether to return biased autocorrelation (default) or unbiased. As there are fewer samples for larger lags, the biased estimate tends to have better statistical stability under noise.

  • method (‘numpy, ‘fft’, ‘time’, pra) – Method for computing the autocorrelation: in the frequency domain with fft or pra (np.fft.rfft is used so only real signals are supported), in the time domain with time, or with numpy’s built-in function np.correlate (default). For p < log2(len(x)), the time domain approach may be more efficient.

Returns

Autocorrelation for x up to lag p.

Return type

numpy array

pyroomacoustics.utilities.clip(signal, high, low)

Clip a signal from above at high and from below at low.

pyroomacoustics.utilities.compare_plot(signal1, signal2, Fs, fft_size=512, norm=False, equal=False, title1=None, title2=None)
pyroomacoustics.utilities.convmtx(x, n)

Create a convolution matrix H for the vector x of size len(x) times n. Then, the result of np.dot(H,v) where v is a vector of length n is the same as np.convolve(x, v).

pyroomacoustics.utilities.create_noisy_signal(signal_fp, snr, noise_fp=None, offset=None)

Create a noisy signal of a specified SNR. :param signal_fp: File path to clean input. :type signal_fp: string :param snr: SNR in dB. :type snr: float :param noise_fp: File path to noise. Default is to use randomly generated white noise. :type noise_fp: string :param offset: Offset in seconds before starting the signal. :type offset: float

Returns

  • numpy array – Noisy signal with specified SNR, between [-1,1] and zero mean.

  • numpy array – Clean signal, untouched from WAV file.

  • numpy array – Added noise such that specified SNR is met.

  • int – Sampling rate in Hz.

pyroomacoustics.utilities.dB(signal, power=False)
pyroomacoustics.utilities.fractional_delay(t0)

Creates a fractional delay filter using a windowed sinc function. The length of the filter is fixed by the module wide constant frac_delay_length (default 81).

Parameters

t0 (float) – The delay in fraction of sample. Typically between -1 and 1.

Returns

A fractional delay filter with specified delay.

Return type

numpy array

pyroomacoustics.utilities.fractional_delay_filter_bank(delays)

Creates a fractional delay filter bank of windowed sinc filters

Parameters

delays (1d narray) – The delays corresponding to each filter in fractional samples

Returns

An ndarray where the ith row contains the fractional delay filter corresponding to the ith delay. The number of columns of the matrix is proportional to the maximum delay.

Return type

numpy array

pyroomacoustics.utilities.goertzel(x, k)

Goertzel algorithm to compute DFT coefficients

pyroomacoustics.utilities.highpass(signal, Fs, fc=None, plot=False)

Filter out the really low frequencies, default is below 50Hz

pyroomacoustics.utilities.levinson(r, b)

Solve a system of the form Rx=b where R is hermitian toeplitz matrix and b is any vector using the generalized Levinson recursion as described in M.H. Hayes, Statistical Signal Processing and Modelling, p. 268.

Parameters
  • r – First column of R, toeplitz hermitian matrix.

  • b – The right-hand argument. If b is a matrix, the system is solved for every column vector in b.

Returns

The solution of the linear system Rx = b.

Return type

numpy array

pyroomacoustics.utilities.low_pass_dirac(t0, alpha, Fs, N)

Creates a vector containing a lowpass Dirac of duration T sampled at Fs with delay t0 and attenuation alpha.

If t0 and alpha are 2D column vectors of the same size, then the function returns a matrix with each line corresponding to pair of t0/alpha values.

pyroomacoustics.utilities.lpc(x, p, biased=True)

Compute p LPC coefficients for a speech segment x.

Parameters
  • x (numpy array) – Real signal in time domain.

  • p (int) – Amount of lag. When solving for LPC coefficient, this is typically the LPC order.

  • biased (bool) – Whether to use biased autocorrelation (default) or unbiased. As there are fewer samples for larger lags, the biased estimate tends to have better statistical stability under noise.

Returns

p LPC coefficients.

Return type

numpy array

pyroomacoustics.utilities.normalize(signal, bits=None)

normalize to be in a given range. The default is to normalize the maximum amplitude to be one. An optional argument allows to normalize the signal to be within the range of a given signed integer representation of bits.

pyroomacoustics.utilities.normalize_pwr(sig1, sig2)

Normalize sig1 to have the same power as sig2.

pyroomacoustics.utilities.prony(x, p, q)

Prony’s Method from Monson H. Hayes’ Statistical Signal Processing, p. 154

Parameters
  • x – signal to model

  • p – order of denominator

  • q – order of numerator

Returns

  • a – numerator coefficients

  • b – denominator coefficients

  • err (the squared error of approximation)

pyroomacoustics.utilities.real_spectrum(signal, axis=-1, **kwargs)
pyroomacoustics.utilities.requires_matplotlib(func)
pyroomacoustics.utilities.rms(data)

Compute root mean square of input.

Parameters

data (numpy array) – Real signal in time domain.

Returns

Root mean square.

Return type

float

pyroomacoustics.utilities.shanks(x, p, q)

Shank’s Method from Monson H. Hayes’ Statistical Signal Processing, p. 154

Parameters
  • x – signal to model

  • p – order of denominator

  • q – order of numerator

Returns

  • a – numerator coefficients

  • b – denominator coefficients

  • err – the squared error of approximation

pyroomacoustics.utilities.spectrum(signal, Fs, N)
pyroomacoustics.utilities.time_dB(signal, Fs, bits=16)

Compute the signed dB amplitude of the oscillating signal normalized wrt the number of bits used for the signal.

pyroomacoustics.utilities.to_16b(signal)

converts float 32 bit signal (-1 to 1) to a signed 16 bits representation No clipping in performed, you are responsible to ensure signal is within the correct interval.

pyroomacoustics.utilities.to_float32(data)

Cast data (typically from WAV) to float32.

Parameters

data (numpy array) – Real signal in time domain, typically obtained from WAV file.

Returns

data as float32.

Return type

numpy array

pyroomacoustics.windows module

Window Functions

This is a collection of many popular window functions used in signal processing.

A few options are provided to correctly construct the required window function. The flag keyword argument can take the following values.

asymmetric

This way, many of the functions will sum to one when their left part is added to their right part. This is useful for overlapped transforms such as the STFT.

symmetric

With this flag, the window is perfectly symmetric. This might be more suitable for analysis tasks.

mdct

Available for only some of the windows. The window is modified to satisfy the perfect reconstruction condition of the MDCT transform.

Often, we would like to get the full window function, but on some occasions, it is useful to get only the left (or right) part. This can be indicated via the keyword argument length that can take values full (default), left, or right.

pyroomacoustics.windows.bart(N, flag='asymmetric', length='full')

The Bartlett window function

\[w[n] = 2 / (M-1) ((M-1)/2 - |n - (M-1)/2|) , n=0,\ldots,N-1\]
Parameters
  • N (int) – the window length

  • flag (string, optional) –

    Possible values

    • asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))

    • symmetric: the window is symmetric (\(M=N-1\))

    • mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))

  • length (string, optional) –

    Possible values

    • full: the full length window is computed

    • right: the right half of the window is computed

    • left: the left half of the window is computed

pyroomacoustics.windows.bart_hann(N, flag='asymmetric', length='full')

The modified Bartlett–Hann window function

\[w[n] = 0.62 - 0.48|(n/M-0.5)| + 0.38 \cos(2\pi(n/M-0.5)), n=0,\ldots,N-1\]
Parameters
  • N (int) – the window length

  • flag (string, optional) –

    Possible values

    • asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))

    • symmetric: the window is symmetric (\(M=N-1\))

    • mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))

  • length (string, optional) –

    Possible values

    • full: the full length window is computed

    • right: the right half of the window is computed

    • left: the left half of the window is computed

pyroomacoustics.windows.blackman(N, flag='asymmetric', length='full')

The Blackman window function

\[w[n] = 0.42 - 0.5\cos(2\pi n/(M-1)) + 0.08\cos(4\pi n/(M-1)), n = 0, \ldots, M-1\]
Parameters
  • N (int) – the window length

  • flag (string, optional) –

    Possible values

    • asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))

    • symmetric: the window is symmetric (\(M=N-1\))

    • mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))

  • length (string, optional) –

    Possible values

    • full: the full length window is computed

    • right: the right half of the window is computed

    • left: the left half of the window is computed

pyroomacoustics.windows.blackman_harris(N, flag='asymmetric', length='full')

The Hann window function

\[w[n] = a_0 - a_1 \cos(2\pi n/M) + a_2 \cos(4\pi n/M) + a_3 \cos(6\pi n/M), n=0,\ldots,N-1\]
Parameters
  • N (int) – the window length

  • flag (string, optional) –

    Possible values

    • asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))

    • symmetric: the window is symmetric (\(M=N-1\))

  • length (string, optional) –

    Possible values

    • full: the full length window is computed

    • right: the right half of the window is computed

    • left: the left half of the window is computed

pyroomacoustics.windows.bohman(N, flag='asymmetric', length='full')

The Bohman window function

\[w[n] = (1-|x|) \cos(\pi |x|) + \pi / |x| \sin(\pi |x|), -1\leq x\leq 1\]
Parameters
  • N (int) – the window length

  • flag (string, optional) –

    Possible values

    • asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))

    • symmetric: the window is symmetric (\(M=N-1\))

    • mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))

  • length (string, optional) –

    Possible values

    • full: the full length window is computed

    • right: the right half of the window is computed

    • left: the left half of the window is computed

pyroomacoustics.windows.cosine(N, flag='asymmetric', length='full')

The cosine window function

\[w[n] = \cos(\pi (n/M - 0.5))^2\]
Parameters
  • N (int) – the window length

  • flag (string, optional) –

    Possible values

    • asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))

    • symmetric: the window is symmetric (\(M=N-1\))

    • mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))

  • length (string, optional) –

    Possible values

    • full: the full length window is computed

    • right: the right half of the window is computed

    • left: the left half of the window is computed

pyroomacoustics.windows.flattop(N, flag='asymmetric', length='full')

The flat top weighted window function

\[w[n] = a_0 - a_1 \cos(2\pi n/M) + a_2 \cos(4\pi n/M) + a_3 \cos(6\pi n/M) + a_4 \cos(8\pi n/M), n=0,\ldots,N-1\]

where

\[a0 = 0.21557895 a1 = 0.41663158 a2 = 0.277263158 a3 = 0.083578947 a4 = 0.006947368\]
Parameters
  • N (int) – the window length

  • flag (string, optional) –

    Possible values

    • asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))

    • symmetric: the window is symmetric (\(M=N-1\))

  • length (string, optional) –

    Possible values

    • full: the full length window is computed

    • right: the right half of the window is computed

    • left: the left half of the window is computed

pyroomacoustics.windows.gaussian(N, std, flag='asymmetric', length='full')

The flat top weighted window function

\[w[n] = e^{ -\frac{1}{2}\left(\frac{n}{\sigma}\right)^2 }\]
Parameters
  • N (int) – the window length

  • std (float) – the standard deviation

  • flag (string, optional) –

    Possible values

    • asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))

    • symmetric: the window is symmetric (\(M=N-1\))

  • length (string, optional) –

    Possible values

    • full: the full length window is computed

    • right: the right half of the window is computed

    • left: the left half of the window is computed

pyroomacoustics.windows.hamming(N, flag='asymmetric', length='full')

The Hamming window function

\[w[n] = 0.54 - 0.46 \cos(2 \pi n / M), n=0,\ldots,N-1\]
Parameters
  • N (int) – the window length

  • flag (string, optional) –

    Possible values

    • asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))

    • symmetric: the window is symmetric (\(M=N-1\))

    • mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))

  • length (string, optional) –

    Possible values

    • full: the full length window is computed

    • right: the right half of the window is computed

    • left: the left half of the window is computed

pyroomacoustics.windows.hann(N, flag='asymmetric', length='full')

The Hann window function

\[w[n] = 0.5 (1 - \cos(2 \pi n / M)), n=0,\ldots,N-1\]
Parameters
  • N (int) – the window length

  • flag (string, optional) –

    Possible values

    • asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))

    • symmetric: the window is symmetric (\(M=N-1\))

    • mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))

  • length (string, optional) –

    Possible values

    • full: the full length window is computed

    • right: the right half of the window is computed

    • left: the left half of the window is computed

pyroomacoustics.windows.kaiser(N, beta, flag='asymmetric', length='full')

The Kaiser window function

\[w[n] = I_0\left( \beta \sqrt{1-\frac{4n^2}{(M-1)^2}} \right)/I_0(\beta)\]

with

\[\quad -\frac{M-1}{2} \leq n \leq \frac{M-1}{2},\]

where \(I_0\) is the modified zeroth-order Bessel function.

Parameters
  • N (int) – the window length

  • beta (float) – Shape parameter, determines trade-off between main-lobe width and side lobe level. As beta gets large, the window narrows.

  • flag (string, optional) –

    Possible values

    • asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))

    • symmetric: the window is symmetric (\(M=N-1\))

    • mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))

  • length (string, optional) –

    Possible values

    • full: the full length window is computed

    • right: the right half of the window is computed

    • left: the left half of the window is computed

pyroomacoustics.windows.rect(N)

The rectangular window

\[w[n] = 1, n=0,\ldots,N-1\]
Parameters

N (int) – the window length

pyroomacoustics.windows.triang(N, flag='asymmetric', length='full')

The triangular window function

\[w[n] = 1 - | 2 n / M - 1 |, n=0,\ldots,N-1\]
Parameters
  • N (int) – the window length

  • flag (string, optional) –

    Possible values

    • asymmetric: asymmetric windows are used for overlapping transforms (\(M=N\))

    • symmetric: the window is symmetric (\(M=N-1\))

    • mdct: impose MDCT condition on the window (\(M=N-1\) and \(w[n]^2 + w[n+N/2]^2=1\))

  • length (string, optional) –

    Possible values

    • full: the full length window is computed

    • right: the right half of the window is computed

    • left: the left half of the window is computed

Module contents

pyroomacoustics

Provides
  1. Room impulse simulations via the image source model

  2. Simulation of sound propagation using STFT engine

  3. Reference implementations of popular algorithms for

  • beamforming

  • direction of arrival

  • adaptive filtering

  • source separation

  • single channel denoising

  • etc

How to use the documentation

Documentation is available in two forms: docstrings provided with the code, and a loose standing reference guide, available from the pyroomacoustics readthedocs page.

We recommend exploring the docstrings using IPython, an advanced Python shell with TAB-completion and introspection capabilities. See below for further instructions.

The docstring examples assume that pyroomacoustics has been imported as pra:

>>> import pyroomacoustics as pra

Code snippets are indicated by three greater-than signs:

>>> x = 42
>>> x = x + 1

Use the built-in help function to view a function’s docstring:

>>> help(pra.stft.STFT)
... 
Available submodules
pyroomacoustics.acoustics

Acoustics and psychoacoustics routines, mel-scale, critcal bands, etc.

pyroomacoustics.beamforming

Microphone arrays and beamforming routines.

pyroomacoustics.directivities

Directivity pattern objects and routines.

pyroomacoustics.geometry

Core geometry routine for the image source model.

pyroomacoustics.metrics

Performance metrics like mean-squared error, median, Itakura-Saito, etc.

pyroomacoustics.multirate

Rate conversion routines.

pyroomacoustics.parameters

Global parameters, i.e. for physics constants.

pyroomacoustics.recognition

Hidden Markov Model and TIMIT database structure.

pyroomacoustics.room

Abstraction of room and image source model.

pyroomacoustics.soundsource

Abstraction for a sound source.

pyroomacoustics.stft

Deprecated Replaced by the methods in pyroomacoustics.transform

pyroomacoustics.sync

A few routines to help synchronize signals.

pyroomacoustics.utilities

A bunch of routines to do various stuff.

pyroomacoustics.wall

Abstraction for walls of a room.

pyroomacoustics.windows

Tapering windows for spectral analysis.

Available subpackages
pyroomacoustics.adaptive

Adaptive filter algorithms

pyroomacoustics.bss

Blind source separation.

pyroomacoustics.datasets

Wrappers around a few popular speech datasets

pyroomacoustics.denoise

Single channel noise reduction methods

pyroomacoustics.doa

Direction of arrival finding algorithms

pyroomacoustics.phase

Phase-related processing

pyroomacoustics.transform

Block frequency domain processing tools

Utilities
__version__

pyroomacoustics version string

Indices and tables