Google Speech Commands¶
Google’s Speech Commands Dataset¶
The Speech Commands Dataset has 65,000 one-second long utterances of 30 short words, by thousands of different people, contributed by members of the public through the AIY website. It’s released under a Creative Commons BY 4.0 license.
More info about the dataset can be found at the link below:
https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html
AIY website for contributing recordings:
https://aiyprojects.withgoogle.com/open_speech_recording
Tutorial on creating a word classifier:
https://www.tensorflow.org/versions/master/tutorials/audio_recognition
- class pyroomacoustics.datasets.google_speech_commands.GoogleSample(path, **kwargs)¶
Bases:
AudioSample
Create the sound object.
- Parameters
path (str) – the path to the audio file
**kwargs – metadata as a list of keyword arguments
- data¶
the actual audio signal
- Type
array_like
- fs¶
sampling frequency
- Type
int
- plot(**kwargs)¶
Plot the spectogram
- class pyroomacoustics.datasets.google_speech_commands.GoogleSpeechCommands(basedir=None, download=False, build=True, subset=None, seed=0, **kwargs)¶
Bases:
Dataset
This class will load the Google Speech Commands Dataset in a structure that is convenient to be processed.
- basedir¶
The directory where the Speech Commands Dataset is located/downloaded.
- Type
str
- size_by_samples¶
A dictionary whose keys are the words in the dataset. The values are the number of occurances for that particular word.
- Type
dict
- subdirs¶
The list of subdirectories in
basedir
, where each sound type is the name of a subdirectory.- Type
list
- classes¶
The list of all sounds, same as the keys of
size_by_samples
.- Type
list
- Parameters
basedir (str, optional) – The directory where the Google Speech Commands dataset is located/downloaded. By default, this is the current directory.
download (bool, optional) – If the corpus does not exist, download it.
build (bool, optional) – Whether or not to build the dataset. By default, it is.
subset (int, optional) – Build a dataset that contains all noise samples and subset samples per word. By default, the dataset will be built with all samples.
seed (int, optional) – Which seed to use for the random generator when selecting a subset of samples. By default,
seed=0
.
- build_corpus(subset=None, **kwargs)¶
Build the corpus with some filters (speech or not speech, sound type).
- filter(**kwargs)¶
Filter the dataset and select samples that match the criterias provided The arguments to the keyword can be 1) a string, 2) a list of strings, 3) a function. There is a match if one of the following is True.
value == attribute
value
is a list andattribute in value == True
value
is a callable (a function) andvalue(attribute) == True