Google Speech Commands¶

Google’s Speech Commands Dataset¶

The Speech Commands Dataset has 65,000 one-second long utterances of 30 short words, by thousands of different people, contributed by members of the public through the AIY website. It’s released under a Creative Commons BY 4.0 license.

More info about the dataset can be found at the link below:

https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html

AIY website for contributing recordings:

https://aiyprojects.withgoogle.com/open_speech_recording

Tutorial on creating a word classifier:

https://www.tensorflow.org/versions/master/tutorials/audio_recognition

class pyroomacoustics.datasets.google_speech_commands.GoogleSample(path, **kwargs)¶

Bases: AudioSample

Create the sound object.

Parameters:

path (str) – the path to the audio file
**kwargs – metadata as a list of keyword arguments

data¶

the actual audio signal

Type:: array_like

fs¶

sampling frequency

Type:: int

plot(**kwargs)¶: Plot the spectogram

class pyroomacoustics.datasets.google_speech_commands.GoogleSpeechCommands(basedir=None, download=False, build=True, subset=None, seed=0, **kwargs)¶

Bases: Dataset

This class will load the Google Speech Commands Dataset in a structure that is convenient to be processed.

basedir¶

The directory where the Speech Commands Dataset is located/downloaded.

Type:: str

size_by_samples¶

A dictionary whose keys are the words in the dataset. The values are the number of occurances for that particular word.

Type:: dict

subdirs¶

The list of subdirectories in basedir, where each sound type is the name of a subdirectory.

Type:: list

classes¶

The list of all sounds, same as the keys of size_by_samples.

Type:: list

Parameters:

basedir (str, optional) – The directory where the Google Speech Commands dataset is located/downloaded. By default, this is the current directory.
download (bool, optional) – If the corpus does not exist, download it.
build (bool, optional) – Whether or not to build the dataset. By default, it is.
subset (int, optional) – Build a dataset that contains all noise samples and subset samples per word. By default, the dataset will be built with all samples.
seed (int, optional) – Which seed to use for the random generator when selecting a subset of samples. By default, seed=0.

build_corpus(subset=None, **kwargs)¶: Build the corpus with some filters (speech or not speech, sound type).

filter(**kwargs)¶

Filter the dataset and select samples that match the criterias provided The arguments to the keyword can be 1) a string, 2) a list of strings, 3) a function. There is a match if one of the following is True.

value == attribute
value is a list and attribute in value == True
value is a callable (a function) and value(attribute) == True

Google Speech Commands¶

Google’s Speech Commands Dataset¶

Pyroomacoustics

Navigation

Related Topics