Please make sure you read the glossary to have a better understanding of this section.
#High-Level Architecture Schema
This scenario describes the steps of the above schema. Please note that most interactions are done through WebSockets.
Client (web app, etc.) makes an HTTP request to GET some information about Leon.
HTTP API responds information to client.
User talks with their microphone.
a. If hotword server is launched, Leon listens (offline) if user is calling him by saying
b. If Leon understands user is calling him, Leon emits a message to the main server via a WebSocket. Now Leon is listening (offline) to user.
c. User said
Hello!to Leon, client transforms the audio input to an audio blob.
ASR transforms audio blob to a wave file.
STT parser transforms wave file to string (
a. User receives string and string is forwarded to NLU.
b. Or user type
Hello!with their keyboard (and ignores steps 1. to 7.a.).
Hello!string is forwarded to NLU.
NLU classifies string and pick up classification.
If collaborative logger is enabled, classification is sent to collaborative logger.
Brain creates a child process and executes the chosen module.
If synchronizer is enabled and module has this option, it synchronizes content.
TTS synthesizer transforms text answer (and send it to user as text) to audio buffer which is played by client.