Classes
The following classes are available globally.
-
This pipeline component uses the Apple
SFSpeech
API to stream audio samples for speech recognition.Once speech pipeline coordination via
See morestartStreaming
is received, the recognizer begins streaming buffered frames to the Apple ASR API for recognition. Once speech pipeline coordination viastopStreaming
is received, or when the Apple ASR API indicates a completed speech event, the recognizer completes the API request and either sends atimeout
ordidRecognize
event with the updated global speech context (including the speech transcript and confidence).Declaration
Swift
@objc public class AppleSpeechRecognizer : NSObject
extension AppleSpeechRecognizer: SpeechProcessor
-
This pipeline component uses the Apple
SFSpeech
API to stream audio samples for wakeword recognition.Once speech pipeline coordination via
See morestartStreaming
is received, the recognizer begins streaming buffered frames to the Apple ASR API for recognition. Upon wakeword or wakephrase recognition, the pipeline activation event is triggered and the recognizer completes the API request and awaits another coordination event. Once speech pipeline coordination viastopStreaming
is received, the recognizer completes the API request and awaits another coordination event.Declaration
Swift
@objc public class AppleWakewordRecognizer : NSObject
extension AppleWakewordRecognizer: SpeechProcessor
-
A simple data class that represents the result of an utterance classification.
See moreDeclaration
Swift
@objc public class NLUResult : NSObject
-
A slot extracted during intent classification.
Remark
Depending on the NLU service used, slots may be typed; if present, the type of each slot can be accessed with thetype
property.Declaration
Swift
@objc public class Slot : NSObject
-
This is the client entry point for the Spokestack BERT NLU implementation. This class provides a classification interface for deriving intents and slots from a natural language utterance. When initialized, the TTS system communicates with the client via either a delegate that receive events or the publisher-subscriber pattern.
// assume that self implements the `SpokestackDelegate` protocol let nlu = try! NLUTensorflow(self, configuration: configuration) nlu.classify(utterance: "I can't turn that light in the room on for you, Dave", context: [:])
See moreUsing the NLUTensorflow class requires the providing a number of `SpeechConfiguration` variables, all prefixed with `nlu`. The most important are the `nluVocabularyPath`, `nluModelPath`, and the `nluModelMetadataPath`.
Declaration
Swift
@objc public class NLUTensorflow : NSObject, NLUService
-
Configuration properties for Spokestack modules.
See moreDeclaration
Swift
@objc public class SpeechConfiguration : NSObject
-
This class maintains global state for the speech pipeline, allowing pipeline components to communicate information among themselves and event handlers.
See moreDeclaration
Swift
@objc public class SpeechContext : NSObject
-
This is the primary client entry point to the Spokestack voice input system. It dynamically binds to configured components that implement the pipeline interfaces for reading audio frames and performing speech recognition tasks.
The pipeline may be stopped/restarted any number of times during its lifecycle. While stopped, the pipeline consumes as few resources as possible. The pipeline runs asynchronously on a dedicated thread, so it does not block the caller when performing I/O and speech processing.
When running, the pipeline communicates with the client via delegates that receive events.
See moreDeclaration
Swift
@objc public final class SpeechPipeline : NSObject
-
Convenience initializer for building a
SpeechPipeline
instance using a pre-configured profile. A pipeline profile encapsulates a series of configuration values tuned for a specific task.Profiles are not authoritative; they act just like calling a series of methods on a
SpeechPipelineBuilder
, and any configuration properties they set can be overridden by subsequent calls.- Example:
// assume that self implements the SpeechEventListener protocol let pipeline = SpeechPipelineBuilder() .addListener(self) .setDelegateDispatchQueue(DispatchQueue.main) .useProfile(.tfLiteWakewordAppleSpeech) .setProperty("tracing", Trace.Level.PERF) .setProperty("detectModelPath", detectPath) .setProperty("encodeModelPath", encodePath) .setProperty("filterModelPath", filterPath) .build() pipeline.start()
Declaration
Swift
@objc public class SpeechPipelineBuilder : NSObject
- Example:
-
This class combines all Spokestack modules into a single component to provide a unified interface to the library’s ASR, NLU, and TTS features. Like the individual modules, it is configurable using a fluent builder pattern, but it provides a default configuration; only a few parameters are required from the calling application, and those only for specific features noted in the documentation for the builder’s methods.
The default configuration of this class assumes that the client application wants to use all of Spokestack’s features, regardless of their implied dependencies or required configuration. If a prerequisite is missing at build time, the individual module may throw an error when called.
This class will run in the context of the caller. The subsystems themselves may use the configured dispatch queues where appropriate to perform intensive tasks.
See moreDeclaration
-
Fluent builder interface for configuring Spokestack.
Example: using all the builder functions
let spokestack = try! SpokestackBuilder() .addDelegate(self) .usePipelineProfile(.vadTriggerAppleSpeech) .setConfiguration(SpeechConfiguration) .setProperty("tracing", Trace.Level.DEBUG) .setDelegateDispatchQueue(DispatchQueue.main) .build()
See also
Declaration
Swift
@objc public class SpokestackBuilder : NSObject
-
This pipeline component streams audio frames to Spokestack’s cloud-based ASR for speech recognition.
Upon the pipeline being activated, the recognizer sends all audio frames to the Spokeatck ASR via a websocket connection. Once the pipeline is deactivated or the activation max is reached, a final empty audio frame is sent which triggers the final recognition transcript. That is passed to the
See moreSpeechEventListener
delegates via thedidRecognize
event with the updated global speech context (including the final transcript and confidence).Declaration
Swift
@available(iOS 13.0, *) @objc public class SpokestackSpeechRecognizer : NSObject
extension SpokestackSpeechRecognizer: SpeechProcessor
-
Undocumented
See moreDeclaration
Swift
@objc public class TFLiteKeywordRecognizer : NSObject
extension TFLiteKeywordRecognizer : SpeechProcessor
-
This pipeline component streams audio samples and uses a TensorFlow Lite binary classifier to detect keyword phrases to process for wakeword recognition. Once a wakeword phrase is detected, the speech pipeline is activated.
Upon activating the speech pipeline, the recognizer completes processing and awaits another coordination call. Once speech pipeline coordination via
stopStreaming
is received, the recognizer stops processing and awaits another coordination event.Once speech pipeline coordination via
startStreaming
is received, the recognizer begins streaming buffered frames that are first normalized and then converted to the magnitude Short-Time Fourier Transform (STFT) representation over a hopped sliding window. This linear spectrogram is then converted to a mel spectrogram via a “filter” TensorFlow model. These mel frames are batched together into a sliding window.The mel spectrogram represents the features to be passed to the autoregressive encoder (usually an rnn or crnn), which is implemented in an “encode” TensorFlow model. This encoder outputs an encoded vector and a state vector. The encoded vectors are batched together into a sliding window for classification, and the state vector is used to perform the autoregressive transduction over the mel frames.
The “detect” TensorFlow model takes the encoded sliding window and outputs a single posterior value in the range [0, 1]. The higher the value, the more likely a keyword phrase is detected. This classifier is commonly implemented as an attention mechanism over the encoder window.
The detector’s outputs are then compared against a configured threshold in order to determine whether to activate the pipeline. If the posterior is greater than the threshold, the pipeline is activated.
See moreDeclaration
Swift
@objc public class TFLiteWakewordRecognizer : NSObject
extension TFLiteWakewordRecognizer : SpeechProcessor
-
This is the client entry point for the Spokestack Text to Speech (TTS) system. It provides the capability to synthesize textual input, and speak back the synthesis as audio system output. The synthesis and speech occur on asynchronous blocks so as to not block the client while it performs network and audio system activities.
When inititalized, the TTS system communicates with the client either via a delegate that receive events, or via a publisher-subscriber pattern.
// assume that self implements the TextToSpeechDelegate protocol. let configuration = SpeechConfiguration() let tts = TextToSpeech(self, configuration: configuration) let input = TextToSpeechInput() input.text = "Hello world!" tts.synthesize(input) // synthesize the provided default text input using the default synthetic voice and api key. tts.speak(input) // synthesize the same input as above, and play back the result using the default audio system.
Using the TTS system requires setting an API client identifier (
See moreSpeechConfiguration.apiId
) and API client secret (SpeechConfiguration.apiSecret
) , which are used to cryptographically sign and meter TTS API usage.Declaration
Swift
@available(iOS 13.0, *) @objc public class TextToSpeech : NSObject
-
Input parameters for speech synthesis. Parameters are considered transient and may change each time
synthesize
is called.See also
TextToSpeech.synthesize
Declaration
Swift
@objc public class TextToSpeechInput : NSObject
-
Result of the
See moreTextToSpeech.synthesize
request.Declaration
Swift
@objc public class TextToSpeechResult : NSObject
-
Undocumented
See moreDeclaration
Swift
@objc public class VADTrigger : NSObject, SpeechProcessor
-
Swift wrapper for WebRTC’s voice activity detector.
See moreDeclaration
Swift
@objc public class WebRTCVAD : NSObject, SpeechProcessor