Classes Reference


                    
                    
                    AppleSpeechRecognizer

This pipeline component uses the Apple SFSpeech API to stream audio samples for speech recognition.

Once speech pipeline coordination via startStreaming is received, the recognizer begins streaming buffered frames to the Apple ASR API for recognition. Once speech pipeline coordination via stopStreaming is received, or when the Apple ASR API indicates a completed speech event, the recognizer completes the API request and either sends a timeout or didRecognize event with the updated global speech context (including the speech transcript and confidence).

Declaration

Swift

@objc
public class AppleSpeechRecognizer : NSObject

extension AppleSpeechRecognizer: SpeechProcessor

Show on GitHub


                    
                    
                    AppleWakewordRecognizer

This pipeline component uses the Apple SFSpeech API to stream audio samples for wakeword recognition.

Once speech pipeline coordination via startStreaming is received, the recognizer begins streaming buffered frames to the Apple ASR API for recognition. Upon wakeword or wakephrase recognition, the pipeline activation event is triggered and the recognizer completes the API request and awaits another coordination event. Once speech pipeline coordination via stopStreaming is received, the recognizer completes the API request and awaits another coordination event.

Declaration

Swift

@objc
public class AppleWakewordRecognizer : NSObject

extension AppleWakewordRecognizer: SpeechProcessor

Show on GitHub


                    
                    
                    NLUResult

A simple data class that represents the result of an utterance classification.

Declaration

Swift

@objc
public class NLUResult : NSObject

Show on GitHub


                    
                    
                    Slot

A slot extracted during intent classification.

Remark

Depending on the NLU service used, slots may be typed; if present, the type of each slot can be accessed with the type property.

Declaration

Swift

@objc
public class Slot : NSObject

Show on GitHub


                    
                    
                    NLUTensorflow

This is the client entry point for the Spokestack BERT NLU implementation. This class provides a classification interface for deriving intents and slots from a natural language utterance. When initialized, the TTS system communicates with the client via either a delegate that receive events or the publisher-subscriber pattern.

 // assume that self implements the `SpokestackDelegate` protocol
 let nlu = try! NLUTensorflow(self, configuration: configuration)
 nlu.classify(utterance: "I can't turn that light in the room on for you, Dave", context: [:])

Using the NLUTensorflow class requires the providing a number of `SpeechConfiguration` variables, all prefixed with `nlu`. The most important are the `nluVocabularyPath`, `nluModelPath`, and the `nluModelMetadataPath`.

Declaration

Swift

@objc
public class NLUTensorflow : NSObject, NLUService

Show on GitHub


                    
                    
                    SpeechConfiguration

Configuration properties for Spokestack modules.

Declaration

Swift

@objc
public class SpeechConfiguration : NSObject

Show on GitHub


                    
                    
                    SpeechContext

This class maintains global state for the speech pipeline, allowing pipeline components to communicate information among themselves and event handlers.

Declaration

Swift

@objc
public class SpeechContext : NSObject

Show on GitHub


                    
                    
                    SpeechPipeline

This is the primary client entry point to the Spokestack voice input system. It dynamically binds to configured components that implement the pipeline interfaces for reading audio frames and performing speech recognition tasks.

The pipeline may be stopped/restarted any number of times during its lifecycle. While stopped, the pipeline consumes as few resources as possible. The pipeline runs asynchronously on a dedicated thread, so it does not block the caller when performing I/O and speech processing.

When running, the pipeline communicates with the client via delegates that receive events.

Declaration

Swift

@objc
public final class SpeechPipeline : NSObject

Show on GitHub


                    
                    
                    SpeechPipelineBuilder

Convenience initializer for building a SpeechPipeline instance using a pre-configured profile. A pipeline profile encapsulates a series of configuration values tuned for a specific task.

Profiles are not authoritative; they act just like calling a series of methods on a SpeechPipelineBuilder, and any configuration properties they set can be overridden by subsequent calls.

Example: // assume that self implements the SpeechEventListener protocol let pipeline = SpeechPipelineBuilder() .addListener(self) .setDelegateDispatchQueue(DispatchQueue.main) .useProfile(.tfLiteWakewordAppleSpeech) .setProperty("tracing", Trace.Level.PERF) .setProperty("detectModelPath", detectPath) .setProperty("encodeModelPath", encodePath) .setProperty("filterModelPath", filterPath) .build() pipeline.start()

Declaration

Swift

@objc
public class SpeechPipelineBuilder : NSObject

Show on GitHub


                    
                    
                    Spokestack

This class combines all Spokestack modules into a single component to provide a unified interface to the library’s ASR, NLU, and TTS features. Like the individual modules, it is configurable using a fluent builder pattern, but it provides a default configuration; only a few parameters are required from the calling application, and those only for specific features noted in the documentation for the builder’s methods.

The default configuration of this class assumes that the client application wants to use all of Spokestack’s features, regardless of their implied dependencies or required configuration. If a prerequisite is missing at build time, the individual module may throw an error when called.

This class will run in the context of the caller. The subsystems themselves may use the configured dispatch queues where appropriate to perform intensive tasks.

Declaration

Swift

@objc
public class Spokestack : NSObject

extension Spokestack: SpokestackDelegate

Show on GitHub


                    
                    
                    SpokestackBuilder

Fluent builder interface for configuring Spokestack.

Example: using all the builder functions

let spokestack = try! SpokestackBuilder()
.addDelegate(self)
.usePipelineProfile(.vadTriggerAppleSpeech)
.setConfiguration(SpeechConfiguration)
.setProperty("tracing", Trace.Level.DEBUG)
.setDelegateDispatchQueue(DispatchQueue.main)
.build()

Declaration

Swift

@objc
public class SpokestackBuilder : NSObject

Show on GitHub


                    
                    
                    SpokestackSpeechRecognizer

This pipeline component streams audio frames to Spokestack’s cloud-based ASR for speech recognition.

Upon the pipeline being activated, the recognizer sends all audio frames to the Spokeatck ASR via a websocket connection. Once the pipeline is deactivated or the activation max is reached, a final empty audio frame is sent which triggers the final recognition transcript. That is passed to the SpeechEventListener delegates via the didRecognize event with the updated global speech context (including the final transcript and confidence).

Declaration

Swift

@available(iOS 13.0, *)
@objc
public class SpokestackSpeechRecognizer : NSObject

extension SpokestackSpeechRecognizer: SpeechProcessor

Show on GitHub


                    
                    
                    TFLiteKeywordRecognizer

Undocumented

Declaration

Swift

@objc
public class TFLiteKeywordRecognizer : NSObject

extension TFLiteKeywordRecognizer : SpeechProcessor

Show on GitHub


                    
                    
                    TFLiteWakewordRecognizer

This pipeline component streams audio samples and uses a TensorFlow Lite binary classifier to detect keyword phrases to process for wakeword recognition. Once a wakeword phrase is detected, the speech pipeline is activated.

Upon activating the speech pipeline, the recognizer completes processing and awaits another coordination call. Once speech pipeline coordination via stopStreaming is received, the recognizer stops processing and awaits another coordination event.

Once speech pipeline coordination via startStreaming is received, the recognizer begins streaming buffered frames that are first normalized and then converted to the magnitude Short-Time Fourier Transform (STFT) representation over a hopped sliding window. This linear spectrogram is then converted to a mel spectrogram via a “filter” TensorFlow model. These mel frames are batched together into a sliding window.

The mel spectrogram represents the features to be passed to the autoregressive encoder (usually an rnn or crnn), which is implemented in an “encode” TensorFlow model. This encoder outputs an encoded vector and a state vector. The encoded vectors are batched together into a sliding window for classification, and the state vector is used to perform the autoregressive transduction over the mel frames.

The “detect” TensorFlow model takes the encoded sliding window and outputs a single posterior value in the range [0, 1]. The higher the value, the more likely a keyword phrase is detected. This classifier is commonly implemented as an attention mechanism over the encoder window.

The detector’s outputs are then compared against a configured threshold in order to determine whether to activate the pipeline. If the posterior is greater than the threshold, the pipeline is activated.

Declaration

Swift

@objc
public class TFLiteWakewordRecognizer : NSObject

extension TFLiteWakewordRecognizer : SpeechProcessor

Show on GitHub


                    
                    
                    TextToSpeech

This is the client entry point for the Spokestack Text to Speech (TTS) system. It provides the capability to synthesize textual input, and speak back the synthesis as audio system output. The synthesis and speech occur on asynchronous blocks so as to not block the client while it performs network and audio system activities.

When inititalized, the TTS system communicates with the client either via a delegate that receive events, or via a publisher-subscriber pattern.

// assume that self implements the TextToSpeechDelegate protocol.
let configuration = SpeechConfiguration()
let tts = TextToSpeech(self, configuration: configuration)
let input = TextToSpeechInput()
input.text = "Hello world!"
tts.synthesize(input) // synthesize the provided default text input using the default synthetic voice and api key.
tts.speak(input) // synthesize the same input as above, and play back the result using the default audio system.

Using the TTS system requires setting an API client identifier (SpeechConfiguration.apiId) and API client secret (SpeechConfiguration.apiSecret) , which are used to cryptographically sign and meter TTS API usage.

Declaration

Swift

@available(iOS 13.0, *)
@objc
public class TextToSpeech : NSObject

Show on GitHub


                    
                    
                    TextToSpeechInput

Input parameters for speech synthesis. Parameters are considered transient and may change each time synthesize is called.

Declaration

Swift

@objc
public class TextToSpeechInput : NSObject

Show on GitHub


                    
                    
                    TextToSpeechResult

Result of the TextToSpeech.synthesize request.

Declaration

Swift

@objc
public class TextToSpeechResult : NSObject

Show on GitHub


                    
                    
                    VADTrigger

Undocumented

Declaration

Swift

@objc
public class VADTrigger : NSObject, SpeechProcessor

Show on GitHub


                    
                    
                    WebRTCVAD

Swift wrapper for WebRTC’s voice activity detector.

Declaration

Swift

@objc
public class WebRTCVAD : NSObject, SpeechProcessor

Show on GitHub