The CMU ARCTIC Dataset

The CMU_ARCTIC databases were constructed at the Language Technologies Institute at Carnegie Mellon University as phonetically balanced, US English single speaker databases designed for unit selection speech synthesis research. A detailed report on the structure and content of the database and the recording environment etc is available as a Carnegie Mellon University, Language Technologies Institute Tech Report CMU-LTI-03-177 and is also available here.

The databases consist of around 1150 utterances carefully selected from out-of-copyright texts from Project Gutenberg. The databses include US English male (bdl) and female (slt) speakers (both experinced voice talent) as well as other accented speakers.

The 1132 sentence prompt list is available from

The distributions include 16KHz waveform and simultaneous EGG signals. Full phoentically labelling was perfromed by the CMU Sphinx using the FestVox based labelling scripts. Complete runnable Festival Voices are included with the database distributions, as examples though better voices can be made by improving labelling etc.

License: Permissive, attribution required

Price: Free


class pyroomacoustics.datasets.cmu_arctic.CMUArcticCorpus(basedir=None, download=False, build=True, **kwargs)

Bases: Dataset

This class will load the CMU ARCTIC corpus in a structure amenable to be processed.


The directory where the CMU ARCTIC corpus is located/downloaded. By default, this is the current directory.


str, option


A dictionary whose keys are the labels of metadata fields attached to the samples. The values are lists of all distinct values the field takes.




The list of all utterances in the corpus


list of CMUArcticSentence

  • basedir (str, optional) – The directory where the CMU ARCTIC corpus is located/downloaded. By default, this is the current directory.

  • download (bool, optional) – If the corpus does not exist, download it.

  • speaker (str or list of str, optional) – A list of the CMU ARCTIC speakers labels. If provided, only those speakers are loaded. By default, all speakers are loaded.

  • sex (str or list of str, optional) – Can be ‘female’ or ‘male’

  • lang (str or list of str, optional) – The language, only ‘English’ is available here

  • accent (str of list of str, optional) – The accent of the speaker


Build the corpus with some filters (sex, lang, accent, sentence_tag, sentence)


Filter the corpus and selects samples that match the criterias provided The arguments to the keyword can be 1) a string, 2) a list of strings, 3) a function. There is a match if one of the following is True.

  1. value == attribute

  2. value is a list and attribute in value == True

  3. value is a callable (a function) and value(attribute) == True

class pyroomacoustics.datasets.cmu_arctic.CMUArcticSentence(path, **kwargs)

Bases: AudioSample

Create the sentence object

  • path (str) – the path to the audio file

  • **kwargs – metadata as a list of keyword arguments


The actual audio signal




sampling frequency




Plot the spectrogram