Google Speech Commands¶
Google’s Speech Commands Dataset¶
The Speech Commands Dataset has 65,000 one-second long utterances of 30 short words, by thousands of different people, contributed by members of the public through the AIY website. It’s released under a Creative Commons BY 4.0 license.
More info about the dataset can be found at the link below:
https://research.googleblog.com/2017/08/launching-speech-commands-dataset.html
AIY website for contributing recordings:
https://aiyprojects.withgoogle.com/open_speech_recording
Tutorial on creating a word classifier:
https://www.tensorflow.org/versions/master/tutorials/audio_recognition
-
class
pyroomacoustics.datasets.google_speech_commands.
GoogleSample
(path, **kwargs)¶ Bases:
pyroomacoustics.datasets.base.AudioSample
Create the sound object.
Parameters: - path (str) – the path to the audio file
- **kwargs – metadata as a list of keyword arguments
-
data
¶ the actual audio signal
Type: array_like
-
fs
¶ sampling frequency
Type: int
-
plot
(**kwargs)¶ Plot the spectogram
-
class
pyroomacoustics.datasets.google_speech_commands.
GoogleSpeechCommands
(basedir=None, download=False, build=True, subset=None, seed=0, **kwargs)¶ Bases:
pyroomacoustics.datasets.base.Dataset
This class will load the Google Speech Commands Dataset in a structure that is convenient to be processed.
-
basedir
¶ The directory where the Speech Commands Dataset is located/downloaded.
Type: str
-
size_by_samples
¶ A dictionary whose keys are the words in the dataset. The values are the number of occurances for that particular word.
Type: dict
-
subdirs
¶ The list of subdirectories in
basedir
, where each sound type is the name of a subdirectory.Type: list
-
classes
¶ The list of all sounds, same as the keys of
size_by_samples
.Type: list
Parameters: - basedir (str, optional) – The directory where the Google Speech Commands dataset is located/downloaded. By default, this is the current directory.
- download (bool, optional) – If the corpus does not exist, download it.
- build (bool, optional) – Whether or not to build the dataset. By default, it is.
- subset (int, optional) – Build a dataset that contains all noise samples and subset samples per word. By default, the dataset will be built with all samples.
- seed (int, optional) – Which seed to use for the random generator when selecting a subset of samples. By default,
seed=0
.
-
build_corpus
(subset=None, **kwargs)¶ Build the corpus with some filters (speech or not speech, sound type).
-
filter
(**kwargs)¶ Filter the dataset and select samples that match the criterias provided The arguments to the keyword can be 1) a string, 2) a list of strings, 3) a function. There is a match if one of the following is True.
value == attribute
value
is a list andattribute in value == True
value
is a callable (a function) andvalue(attribute) == True
-