TIMIT Corpus¶
The TIMIT Dataset¶
The TIMIT corpus of read speech is designed to provide speech data for acoustic-phonetic studies and for the development and evaluation of automatic speech recognition systems. TIMIT contains broadband recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences. The TIMIT corpus includes time-aligned orthographic, phonetic and word transcriptions as well as a 16-bit, 16kHz speech waveform file for each utterance. Corpus design was a joint effort among the Massachusetts Institute of Technology (MIT), SRI International (SRI) and Texas Instruments, Inc. (TI). The speech was recorded at TI, transcribed at MIT and verified and prepared for CD-ROM production by the National Institute of Standards and Technology (NIST).
The TIMIT corpus transcriptions have been hand verified. Test and training subsets, balanced for phonetic and dialectal coverage, are specified. Tabular computer-searchable information is included as well as written documentation.
Unfortunately, this is a proprietary dataset. A licensed can be obtained for $125 to $250 depending on your status (academic or otherwise).
Deprecation Warning: The interface of TimitCorpus will change in the near
future to match that of pyroomacoustics.datasets.cmu_arctic.CMUArcticCorpus
URL: https://catalog.ldc.upenn.edu/ldc93s1
-
class
pyroomacoustics.datasets.timit.
Sentence
(path)¶ Bases:
object
Create the sentence object
Parameters: path ((string)) – the path to the particular sample -
speaker
¶ Speaker initials
Type: str
-
id
¶ a digit to disambiguate identical initials
Type: str
-
sex
¶ Speaker gender (M or F)
Type: str
-
dialect
¶ Speaker dialect region number:
- New England
- Northern
- North Midland
- South Midland
- Southern
- New York City
- Western
- Army Brat (moved around)
Type: str
-
fs
¶ sampling frequency
Type: int
-
samples
¶ the audio track
Type: array_like (n_samples,)
-
text
¶ the text of the sentence
Type: str
-
words
¶ list of Word objects forming the sentence
Type: list
-
phonems
¶ List of phonems contained in the sentence. Each element is a dictionnary containing a ‘bnd’ with the limits of the phonem and ‘name’ that is the phonem transcription.
Type: list
-
play
()¶ Play the sound sample
-
plot
(L=512, hop=128, zpb=0, phonems=False, **kwargs)¶
-
-
class
pyroomacoustics.datasets.timit.
TimitCorpus
(basedir)¶ Bases:
object
TimitCorpus class
Parameters: - basedir ((string)) – The location of the TIMIT database
- directories ((list of strings)) – The subdirectories containing the data ([‘TEST’,’TRAIN’])
- sentence_corpus ((dict)) – A dictionnary that contains a list of Sentence objects for each sub-directory
- word_corpus ((dict)) – A dictionnary that contains a list of Words objects for each sub-directory and word available in the corpus
-
build_corpus
(sentences=None, dialect_region=None, speakers=None, sex=None)¶ Build the corpus
The TIMIT database structure is encoded in the directory sturcture:
- basedir
- TEST/TRAIN
- Regional accent index (1 to 8)
- Speakers (one directory per speaker)
- Sentences (one file per sentence)
Parameters: - sentences ((list)) – A list containing the sentences to which we want to restrict the corpus Example: sentences=[‘SA1’,’SA2’]
- dialect_region ((list of int)) – A list to which we restrict the dialect regions Example: dialect_region=[1, 4, 5]
- speakers ((list)) – A list of speakers acronym to which we want to restrict the corpus Example: speakers=[‘AKS0’]
- sex ((string)) – Restrict to a single sex: ‘F’ for female, ‘M’ for male
-
get_word
(d, w, index=0)¶ return instance index of word w from group (test or train) d
-
class
pyroomacoustics.datasets.timit.
Word
(word, boundaries, data, fs, phonems=None)¶ Bases:
object
A class used for words of the TIMIT corpus
-
word
¶ The spelling of the word
Type: str
-
boundaries
¶ The limits of the word within the sentence
Type: list
-
samples
¶ A view on the sentence samples containing the word
Type: array_like
-
fs
¶ The sampling frequency
Type: int
-
phonems
¶ A list of phones contained in the word
Type: list
-
features
¶ A feature array (e.g. MFCC coefficients)
Type: array_like
Parameters: - word (str) – The spelling of the word
- boundaries (list) – The limits of the word within the sentence
- data (array_like) – The nd-array that contains all the samples of the sentence
- fs (int) – The sampling frequency
- phonems (list, optional) – A list of phones contained in the word
-
mfcc
(frame_length=1024, hop=512)¶ compute the mel-frequency cepstrum coefficients of the word samples
-
play
()¶ Play the sound sample
-
plot
()¶
-