Classes

The following classes are available globally.

  • This pipeline component uses the Apple SFSpeech API to stream audio samples for speech recognition.

    Once speech pipeline coordination via startStreaming is received, the recognizer begins streaming buffered frames to the Apple ASR API for recognition. Once speech pipeline coordination via stopStreaming is received, or when the Apple ASR API indicates a completed speech event, the recognizer completes the API request and either sends a timeout or didRecognize event with the updated global speech context (including the speech transcript and confidence).

    See more

    Declaration

    Swift

    @objc
    public class AppleSpeechRecognizer : NSObject
    extension AppleSpeechRecognizer: SpeechProcessor
  • This pipeline component uses the Apple SFSpeech API to stream audio samples for wakeword recognition.

    Once speech pipeline coordination via startStreaming is received, the recognizer begins streaming buffered frames to the Apple ASR API for recognition. Upon wakeword or wakephrase recognition, the pipeline activation event is triggered and the recognizer completes the API request and awaits another coordination event. Once speech pipeline coordination via stopStreaming is received, the recognizer completes the API request and awaits another coordination event.

    See more

    Declaration

    Swift

    @objc
    public class AppleWakewordRecognizer : NSObject
    extension AppleWakewordRecognizer: SpeechProcessor
  • A simple data class that represents the result of an utterance classification.

    See more

    Declaration

    Swift

    @objc
    public class NLUResult : NSObject
  • A slot extracted during intent classification.

    Remark

    Depending on the NLU service used, slots may be typed; if present, the type of each slot can be accessed with the type property.
    See more

    Declaration

    Swift

    @objc
    public class Slot : NSObject
  • This is the client entry point for the Spokestack BERT NLU implementation. This class provides a classification interface for deriving intents and slots from a natural language utterance. When initialized, the TTS system communicates with the client via either a delegate that receive events or the publisher-subscriber pattern.

     // assume that self implements the `SpokestackDelegate` protocol
     let nlu = try! NLUTensorflow(self, configuration: configuration)
     nlu.classify(utterance: "I can't turn that light in the room on for you, Dave", context: [:])
    
    Using the NLUTensorflow class requires the providing a number of `SpeechConfiguration` variables, all prefixed with `nlu`. The most important are the `nluVocabularyPath`, `nluModelPath`, and the `nluModelMetadataPath`.
    
    See more

    Declaration

    Swift

    @objc
    public class NLUTensorflow : NSObject, NLUService
  • Configuration properties for Spokestack modules.

    See more

    Declaration

    Swift

    @objc
    public class SpeechConfiguration : NSObject
  • This class maintains global state for the speech pipeline, allowing pipeline components to communicate information among themselves and event handlers.

    See more

    Declaration

    Swift

    @objc
    public class SpeechContext : NSObject
  • This is the primary client entry point to the Spokestack voice input system. It dynamically binds to configured components that implement the pipeline interfaces for reading audio frames and performing speech recognition tasks.

    The pipeline may be stopped/restarted any number of times during its lifecycle. While stopped, the pipeline consumes as few resources as possible. The pipeline runs asynchronously on a dedicated thread, so it does not block the caller when performing I/O and speech processing.

    When running, the pipeline communicates with the client via delegates that receive events.

    See more

    Declaration

    Swift

    @objc
    public final class SpeechPipeline : NSObject
  • Convenience initializer for building a SpeechPipeline instance using a pre-configured profile. A pipeline profile encapsulates a series of configuration values tuned for a specific task.

    Profiles are not authoritative; they act just like calling a series of methods on a SpeechPipelineBuilder, and any configuration properties they set can be overridden by subsequent calls.

    • Example: // assume that self implements the SpeechEventListener protocol let pipeline = SpeechPipelineBuilder() .addListener(self) .setDelegateDispatchQueue(DispatchQueue.main) .useProfile(.tfLiteWakewordAppleSpeech) .setProperty("tracing", Trace.Level.PERF) .setProperty("detectModelPath", detectPath) .setProperty("encodeModelPath", encodePath) .setProperty("filterModelPath", filterPath) .build() pipeline.start()
    See more

    Declaration

    Swift

    @objc
    public class SpeechPipelineBuilder : NSObject
  • This class combines all Spokestack modules into a single component to provide a unified interface to the library’s ASR, NLU, and TTS features. Like the individual modules, it is configurable using a fluent builder pattern, but it provides a default configuration; only a few parameters are required from the calling application, and those only for specific features noted in the documentation for the builder’s methods.

    The default configuration of this class assumes that the client application wants to use all of Spokestack’s features, regardless of their implied dependencies or required configuration. If a prerequisite is missing at build time, the individual module may throw an error when called.

    This class will run in the context of the caller. The subsystems themselves may use the configured dispatch queues where appropriate to perform intensive tasks.

    See more

    Declaration

    Swift

    @objc
    public class Spokestack : NSObject
    extension Spokestack: SpokestackDelegate
  • Fluent builder interface for configuring Spokestack.

    • Example: using all the builder functions

      let spokestack = try! SpokestackBuilder()
      .addDelegate(self)
      .usePipelineProfile(.vadTriggerAppleSpeech)
      .setConfiguration(SpeechConfiguration)
      .setProperty("tracing", Trace.Level.DEBUG)
      .setDelegateDispatchQueue(DispatchQueue.main)
      .build()
      

    See also

    Spokestack

    See more

    Declaration

    Swift

    @objc
    public class SpokestackBuilder : NSObject
  • This pipeline component streams audio frames to Spokestack’s cloud-based ASR for speech recognition.

    Upon the pipeline being activated, the recognizer sends all audio frames to the Spokeatck ASR via a websocket connection. Once the pipeline is deactivated or the activation max is reached, a final empty audio frame is sent which triggers the final recognition transcript. That is passed to the SpeechEventListener delegates via the didRecognize event with the updated global speech context (including the final transcript and confidence).

    See more

    Declaration

    Swift

    @available(iOS 13.0, *)
    @objc
    public class SpokestackSpeechRecognizer : NSObject
    extension SpokestackSpeechRecognizer: SpeechProcessor
  • Undocumented

    See more

    Declaration

    Swift

    @objc
    public class TFLiteKeywordRecognizer : NSObject
    extension TFLiteKeywordRecognizer : SpeechProcessor
  • This pipeline component streams audio samples and uses a TensorFlow Lite binary classifier to detect keyword phrases to process for wakeword recognition. Once a wakeword phrase is detected, the speech pipeline is activated.

    Upon activating the speech pipeline, the recognizer completes processing and awaits another coordination call. Once speech pipeline coordination via stopStreaming is received, the recognizer stops processing and awaits another coordination event.

    Once speech pipeline coordination via startStreaming is received, the recognizer begins streaming buffered frames that are first normalized and then converted to the magnitude Short-Time Fourier Transform (STFT) representation over a hopped sliding window. This linear spectrogram is then converted to a mel spectrogram via a “filter” TensorFlow model. These mel frames are batched together into a sliding window.

    The mel spectrogram represents the features to be passed to the autoregressive encoder (usually an rnn or crnn), which is implemented in an “encode” TensorFlow model. This encoder outputs an encoded vector and a state vector. The encoded vectors are batched together into a sliding window for classification, and the state vector is used to perform the autoregressive transduction over the mel frames.

    The “detect” TensorFlow model takes the encoded sliding window and outputs a single posterior value in the range [0, 1]. The higher the value, the more likely a keyword phrase is detected. This classifier is commonly implemented as an attention mechanism over the encoder window.

    The detector’s outputs are then compared against a configured threshold in order to determine whether to activate the pipeline. If the posterior is greater than the threshold, the pipeline is activated.

    See more

    Declaration

    Swift

    @objc
    public class TFLiteWakewordRecognizer : NSObject
    extension TFLiteWakewordRecognizer : SpeechProcessor
  • This is the client entry point for the Spokestack Text to Speech (TTS) system. It provides the capability to synthesize textual input, and speak back the synthesis as audio system output. The synthesis and speech occur on asynchronous blocks so as to not block the client while it performs network and audio system activities.

    When inititalized, the TTS system communicates with the client either via a delegate that receive events, or via a publisher-subscriber pattern.

    // assume that self implements the TextToSpeechDelegate protocol.
    let configuration = SpeechConfiguration()
    let tts = TextToSpeech(self, configuration: configuration)
    let input = TextToSpeechInput()
    input.text = "Hello world!"
    tts.synthesize(input) // synthesize the provided default text input using the default synthetic voice and api key.
    tts.speak(input) // synthesize the same input as above, and play back the result using the default audio system.
    

    Using the TTS system requires setting an API client identifier (SpeechConfiguration.apiId) and API client secret (SpeechConfiguration.apiSecret) , which are used to cryptographically sign and meter TTS API usage.

    See more

    Declaration

    Swift

    @available(iOS 13.0, *)
    @objc
    public class TextToSpeech : NSObject
  • Input parameters for speech synthesis. Parameters are considered transient and may change each time synthesize is called.

    See also

    TextToSpeech.synthesize
    See more

    Declaration

    Swift

    @objc
    public class TextToSpeechInput : NSObject
  • Result of the TextToSpeech.synthesize request.

    See more

    Declaration

    Swift

    @objc
    public class TextToSpeechResult : NSObject
  • Undocumented

    See more

    Declaration

    Swift

    @objc
    public class VADTrigger : NSObject, SpeechProcessor
  • Swift wrapper for WebRTC’s voice activity detector.

    See more

    Declaration

    Swift

    @objc
    public class WebRTCVAD : NSObject, SpeechProcessor