TFLiteWakewordRecognizer
@objc
public class TFLiteWakewordRecognizer : NSObject
extension TFLiteWakewordRecognizer : SpeechProcessor
This pipeline component streams audio samples and uses a TensorFlow Lite binary classifier to detect keyword phrases to process for wakeword recognition. Once a wakeword phrase is detected, the speech pipeline is activated.
Upon activating the speech pipeline, the recognizer completes processing and awaits another coordination call. Once speech pipeline coordination via stopStreaming
is received, the recognizer stops processing and awaits another coordination event.
Once speech pipeline coordination via startStreaming
is received, the recognizer begins streaming buffered frames that are first normalized and then converted to the magnitude Short-Time Fourier Transform (STFT) representation over a hopped sliding window. This linear spectrogram is then converted to a mel spectrogram via a “filter” TensorFlow model. These mel frames are batched together into a sliding window.
The mel spectrogram represents the features to be passed to the autoregressive encoder (usually an rnn or crnn), which is implemented in an “encode” TensorFlow model. This encoder outputs an encoded vector and a state vector. The encoded vectors are batched together into a sliding window for classification, and the state vector is used to perform the autoregressive transduction over the mel frames.
The “detect” TensorFlow model takes the encoded sliding window and outputs a single posterior value in the range [0, 1]. The higher the value, the more likely a keyword phrase is detected. This classifier is commonly implemented as an attention mechanism over the encoder window.
The detector’s outputs are then compared against a configured threshold in order to determine whether to activate the pipeline. If the posterior is greater than the threshold, the pipeline is activated.
-
Configuration for the recognizer.
Declaration
Swift
@objc public var configuration: SpeechConfiguration
-
Global state for the speech pipeline.
Declaration
Swift
@objc public var context: SpeechContext
-
Initializes a TFLiteWakewordRecognizer instance.
A recognizer is initialized by, and receives
startStreaming
andstopStreaming
events from, an instance ofSpeechPipeline
.The TFLiteWakewordRecognizer receives audio data frames to
process
fromAudioController
.Declaration
Swift
@objc public init(_ configuration: SpeechConfiguration, context: SpeechContext)
Parameters
configuration
Configuration for the recognizer.
context
Global state for the speech pipeline.
-
Triggered by the speech pipeline, instructing the recognizer to begin streaming and processing audio.
Declaration
Swift
@objc public func startStreaming()
-
Triggered by the speech pipeline, instructing the recognizer to stop streaming audio and complete processing.
Declaration
Swift
@objc public func stopStreaming()
-
Receives a frame of audio samples for processing. Interface between the
SpeechProcessor
andAudioController
components.Processes audio in an async thread.
Declaration
Swift
@objc public func process(_ frame: Data)
Parameters
frame
Frame of audio samples.