Google Speech Commands

Google’s Speech Commands Dataset

The Speech Commands Dataset has 65,000 one-second long utterances of 30 short words, by thousands of different people, contributed by members of the public through the AIY website. It’s released under a Creative Commons BY 4.0 license.

More info about the dataset can be found at the link below:

https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html

AIY website for contributing recordings:

https://aiyprojects.withgoogle.com/open_speech_recording

Tutorial on creating a word classifier:

https://www.tensorflow.org/versions/master/tutorials/audio_recognition

class pyroomacoustics.datasets.google_speech_commands.GoogleSample(path, **kwargs)

Bases: AudioSample

Create the sound object.

Parameters
  • path (str) – the path to the audio file

  • **kwargs – metadata as a list of keyword arguments

data

the actual audio signal

Type

array_like

fs

sampling frequency

Type

int

plot(**kwargs)

Plot the spectogram

class pyroomacoustics.datasets.google_speech_commands.GoogleSpeechCommands(basedir=None, download=False, build=True, subset=None, seed=0, **kwargs)

Bases: Dataset

This class will load the Google Speech Commands Dataset in a structure that is convenient to be processed.

basedir

The directory where the Speech Commands Dataset is located/downloaded.

Type

str

size_by_samples

A dictionary whose keys are the words in the dataset. The values are the number of occurances for that particular word.

Type

dict

subdirs

The list of subdirectories in basedir, where each sound type is the name of a subdirectory.

Type

list

classes

The list of all sounds, same as the keys of size_by_samples.

Type

list

Parameters
  • basedir (str, optional) – The directory where the Google Speech Commands dataset is located/downloaded. By default, this is the current directory.

  • download (bool, optional) – If the corpus does not exist, download it.

  • build (bool, optional) – Whether or not to build the dataset. By default, it is.

  • subset (int, optional) – Build a dataset that contains all noise samples and subset samples per word. By default, the dataset will be built with all samples.

  • seed (int, optional) – Which seed to use for the random generator when selecting a subset of samples. By default, seed=0.

build_corpus(subset=None, **kwargs)

Build the corpus with some filters (speech or not speech, sound type).

filter(**kwargs)

Filter the dataset and select samples that match the criterias provided The arguments to the keyword can be 1) a string, 2) a list of strings, 3) a function. There is a match if one of the following is True.

  1. value == attribute

  2. value is a list and attribute in value == True

  3. value is a callable (a function) and value(attribute) == True