TFLiteWakewordRecognizer

@objc
public class TFLiteWakewordRecognizer : NSObject

extension TFLiteWakewordRecognizer : SpeechProcessor

This pipeline component streams audio samples and uses a TensorFlow Lite binary classifier to detect keyword phrases to process for wakeword recognition. Once a wakeword phrase is detected, the speech pipeline is activated.

Upon activating the speech pipeline, the recognizer completes processing and awaits another coordination call. Once speech pipeline coordination via stopStreaming is received, the recognizer stops processing and awaits another coordination event.

Once speech pipeline coordination via startStreaming is received, the recognizer begins streaming buffered frames that are first normalized and then converted to the magnitude Short-Time Fourier Transform (STFT) representation over a hopped sliding window. This linear spectrogram is then converted to a mel spectrogram via a “filter” TensorFlow model. These mel frames are batched together into a sliding window.

The mel spectrogram represents the features to be passed to the autoregressive encoder (usually an rnn or crnn), which is implemented in an “encode” TensorFlow model. This encoder outputs an encoded vector and a state vector. The encoded vectors are batched together into a sliding window for classification, and the state vector is used to perform the autoregressive transduction over the mel frames.

The “detect” TensorFlow model takes the encoded sliding window and outputs a single posterior value in the range [0, 1]. The higher the value, the more likely a keyword phrase is detected. This classifier is commonly implemented as an attention mechanism over the encoder window.

The detector’s outputs are then compared against a configured threshold in order to determine whether to activate the pipeline. If the posterior is greater than the threshold, the pipeline is activated.

Show on GitHub

Public (properties)

configuration
Configuration for the recognizer.
Declaration
Swift

@objc public var configuration: SpeechConfiguration
Show on GitHub
context
Global state for the speech pipeline.
Declaration
Swift

@objc public var context: SpeechContext
Show on GitHub

NSObject methods


                    
                    
                    init(_:context:)

Initializes a TFLiteWakewordRecognizer instance.

A recognizer is initialized by, and receives startStreaming and stopStreaming events from, an instance of SpeechPipeline.

The TFLiteWakewordRecognizer receives audio data frames to process from AudioController.

Declaration

Swift

@objc
public init(_ configuration: SpeechConfiguration, context: SpeechContext)

Parameters

`configuration`	Configuration for the recognizer.
`context`	Global state for the speech pipeline.

Show on GitHub

SpeechProcessor implementation

startStreaming()
Triggered by the speech pipeline, instructing the recognizer to begin streaming and processing audio.
Declaration
Swift

@objc public func startStreaming()
Show on GitHub
stopStreaming()
Triggered by the speech pipeline, instructing the recognizer to stop streaming audio and complete processing.
Declaration
Swift

@objc public func stopStreaming()
Show on GitHub


                    
                    
                    process(_:)

Receives a frame of audio samples for processing. Interface between the SpeechProcessor and AudioController components.

Processes audio in an async thread.

Declaration

Swift

@objc
public func process(_ frame: Data)

Parameters


                                frame

Frame of audio samples.

Show on GitHub