Google Speech Commands

Google’s Speech Commands Dataset

The Speech Commands Dataset has 65,000 one-second long utterances of 30 short words, by thousands of different people, contributed by members of the public through the AIY website. It’s released under a Creative Commons BY 4.0 license.

More info about the dataset can be found at the link below:

https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html

AIY website for contributing recordings:

https://aiyprojects.withgoogle.com/open_speech_recording

Tutorial on creating a word classifier:

https://www.tensorflow.org/versions/master/tutorials/audio_recognition

class pyroomacoustics.datasets.google_speech_commands.GoogleSample(path, **kwargs)

Bases: AudioSample

Create the sound object.

Parameters:
  • path (str) – the path to the audio file

  • **kwargs – metadata as a list of keyword arguments

data

the actual audio signal

Type:

array_like

fs

sampling frequency

Type:

int

plot(**kwargs)

Plot the spectogram

class pyroomacoustics.datasets.google_speech_commands.GoogleSpeechCommands(basedir=None, download=False, build=True, subset=None, seed=0, **kwargs)

Bases: Dataset

This class will load the Google Speech Commands Dataset in a structure that is convenient to be processed.

basedir

The directory where the Speech Commands Dataset is located/downloaded.

Type:

str

size_by_samples

A dictionary whose keys are the words in the dataset. The values are the number of occurances for that particular word.

Type:

dict

subdirs

The list of subdirectories in basedir, where each sound type is the name of a subdirectory.

Type:

list

classes

The list of all sounds, same as the keys of size_by_samples.

Type:

list

Parameters:
  • basedir (str, optional) – The directory where the Google Speech Commands dataset is located/downloaded. By default, this is the current directory.

  • download (bool, optional) – If the corpus does not exist, download it.

  • build (bool, optional) – Whether or not to build the dataset. By default, it is.

  • subset (int, optional) – Build a dataset that contains all noise samples and subset samples per word. By default, the dataset will be built with all samples.

  • seed (int, optional) – Which seed to use for the random generator when selecting a subset of samples. By default, seed=0.

build_corpus(subset=None, **kwargs)

Build the corpus with some filters (speech or not speech, sound type).

filter(**kwargs)

Filter the dataset and select samples that match the criterias provided The arguments to the keyword can be 1) a string, 2) a list of strings, 3) a function. There is a match if one of the following is True.

  1. value == attribute

  2. value is a list and attribute in value == True

  3. value is a callable (a function) and value(attribute) == True