@objc public class TFLiteWakewordRecognizer : NSObject
extension TFLiteWakewordRecognizer : SpeechProcessor
This pipeline component streams audio samples and uses a TensorFlow Lite binary classifier to detect keyword phrases to process for wakeword recognition. Once a wakeword phrase is detected, the speech pipeline is activated.
Upon activating the speech pipeline, the recognizer completes processing and awaits another coordination call. Once speech pipeline coordination via
stopStreaming is received, the recognizer stops processing and awaits another coordination event.
Once speech pipeline coordination via
startStreaming is received, the recognizer begins streaming buffered frames that are first normalized and then converted to the magnitude Short-Time Fourier Transform (STFT) representation over a hopped sliding window. This linear spectrogram is then converted to a mel spectrogram via a “filter” TensorFlow model. These mel frames are batched together into a sliding window.
The mel spectrogram represents the features to be passed to the autoregressive encoder (usually an rnn or crnn), which is implemented in an “encode” TensorFlow model. This encoder outputs an encoded vector and a state vector. The encoded vectors are batched together into a sliding window for classification, and the state vector is used to perform the autoregressive transduction over the mel frames.
The “detect” TensorFlow model takes the encoded sliding window and outputs a single posterior value in the range [0, 1]. The higher the value, the more likely a keyword phrase is detected. This classifier is commonly implemented as an attention mechanism over the encoder window.
The detector’s outputs are then compared against a configured threshold in order to determine whether to activate the pipeline. If the posterior is greater than the threshold, the pipeline is activated.
Initializes a TFLiteWakewordRecognizer instance.
A recognizer is initialized by, and receives
stopStreamingevents from, an instance of
The TFLiteWakewordRecognizer receives audio data frames to
Configuration for the recognizer.
Global state for the speech pipeline.
Triggered by the speech pipeline, instructing the recognizer to begin streaming and processing audio.
@objc public func startStreaming()
Triggered by the speech pipeline, instructing the recognizer to stop streaming audio and complete processing.
@objc public func stopStreaming()