
public class TFLiteWakewordRecognizer : NSObject
extension TFLiteWakewordRecognizer : SpeechProcessor

This pipeline component streams audio samples and uses a TensorFlow Lite binary classifier to detect keyword phrases to process for wakeword recognition. Once a wakeword phrase is detected, the speech pipeline is activated.

Upon activating the speech pipeline, the recognizer completes processing and awaits another coordination call. Once speech pipeline coordination via stopStreaming is received, the recognizer stops processing and awaits another coordination event.

Once speech pipeline coordination via startStreaming is received, the recognizer begins streaming buffered frames that are first normalized and then converted to the magnitude Short-Time Fourier Transform (STFT) representation over a hopped sliding window. This linear spectrogram is then converted to a mel spectrogram via a “filter” TensorFlow model. These mel frames are batched together into a sliding window.

The mel spectrogram represents the features to be passed to the autoregressive encoder (usually an rnn or crnn), which is implemented in an “encode” TensorFlow model. This encoder outputs an encoded vector and a state vector. The encoded vectors are batched together into a sliding window for classification, and the state vector is used to perform the autoregressive transduction over the mel frames.

The “detect” TensorFlow model takes the encoded sliding window and outputs a single posterior value in the range [0, 1]. The higher the value, the more likely a keyword phrase is detected. This classifier is commonly implemented as an attention mechanism over the encoder window.

The detector’s outputs are then compared against a configured threshold in order to determine whether to activate the pipeline. If the posterior is greater than the threshold, the pipeline is activated.

Public (properties)

NSObject methods

  • Initializes a TFLiteWakewordRecognizer instance.

    A recognizer is initialized by, and receives startStreaming and stopStreaming events from, an instance of SpeechPipeline.

    The TFLiteWakewordRecognizer receives audio data frames to process from AudioController.



    public init(_ configuration: SpeechConfiguration, context: SpeechContext)



    Configuration for the recognizer.


    Global state for the speech pipeline.

SpeechProcessor implementation

  • Triggered by the speech pipeline, instructing the recognizer to begin streaming and processing audio.



    public func startStreaming()
  • Triggered by the speech pipeline, instructing the recognizer to stop streaming audio and complete processing.



    public func stopStreaming()
  • Receives a frame of audio samples for processing. Interface between the SpeechProcessor and AudioController components.

    Processes audio in an async thread.



    public func process(_ frame: Data)



    Frame of audio samples.