Core components

At the moment EnglishScript SW architecture includes the following components:

  • Human Application Protocol (HAP) library – natural language processing & generation, human memory & behavior simulation
  • Alvin A.I. – contents of a simulated human memory
  • TM client-server protocol library – communication between Alvin A.I. and user’s devices
  • OpenSource PocketSphinx library – speech recognition
  • OpenSource Pico TTS library – Text-To-Speech synthesis

Here is an illustration of the different data streams and how they are routed between different components:

ES SW Architecture



The Human Application Protocol (HAP) is the core component of the ES Runtime. It processes any received input data and tries to understand its meaning related to previously received and understood data. It constructs data structures that try to simulate a human memory i.e. how a human being models information in his/her mind.

The Alvin A.I. is an Artificial Intelligence being that contains memory (both data and logic) for a simulated human being. When HAP successfully understand the meaning of some received input (text or speech) it stores the learned information to Alvin’s memory (that is inside HAP). Alvin’s memory can then be saved to a file (ESO file format) to be able to later restore them back to HAP. Alvin is essentially the contents of his memory. HAP runs Alvin’s logic and evaluates its data from Alvin’s memory to simulate human behavior.


The HAP supports both a stream of text and a stream of speech audio as input and output.

Textual input can be provided e.g. by a user typing text using a keyboard, or by the ES Compiler that reads the input from textual files (ES file format). All text must be provided as EnglishScript programming language. No typing mistakes are allowed, so if you make a mistake you need to retype it again.

For speech audio input HAP uses any audio channels available on the system (e.g. ALSA in Linux OS) and forwards audio data stream to the PocketSphinx library that recognizes individual words (with probabilities) used in the audio stream. HAP then processes this stream of words & probabilities to figure out the actual sentence based on the previously learned and discussed things. This part is still work in progress, so the current implementation requires very good English language pronunciation.

When Alvin A.I. wants to say something the HAP outputs textual stream of sentences (ES language formatted) and prints them e.g. on a terminal window. HAP also converts them to speech audio using the PicoTTS TextToSpeech library. The audio is routed to speakers using any standard audio channels provided by the system (e.g. ALSA in Linux OS).

3rd party libraries

Since EnglishScript, as a programming language, is not very good for implementing any lower level stuff the ES Runtime provides a way to call native C/C++ code from an external dynamically linked library (an .so file in Linux OS). The HAP library provides a public C API for communicating with HAP from the external library. The HAP expects a specific kind of function API from the external library. Any number of libraries can be used at the same time. For more information, please see ES Tutorial document, the chapter “3rd party libraries”.

TM client-server protocol

The TM client-server protocol is used for transmitting both data streams and text streams (ES formatted sentences) between Alvin A.I. and user’s devices. It supports both point-to-point connections and broadcast messages & data.

The TM Server application is placed to some server HW on the Internet where any TM Client (or other TM Servers) can connect to (to its public IP). The TM Server authenticates any connected TM Clients and routes data streams and messages between them.

TM Client application is installed to every device the user has. There can be many different kind of TM client applications depending on the device (OS, HW) and the use case, e.g., the ES Client is a TM Client to allow Alvin communicate with user’s devices.

Alvin as the central intelligence of an IoT network

The ES Runtime and Alvin A.I. together are used for implementing the central intelligence for a complete Internet Of Things (IoT) network.

The network is implemented using the TM client-server protocol. Any device that can be connected to Internet can be controlled by the Alvin A.I. via TM protocol.

Alvin A.I. is typically located in a mobile device that the user has with him/her i.e. a mobile phone.

The ES client application is installed to each PC/laptop/mobile the user has. This allows Alvin to control them via TM protocol. Alvin can be moved between ES clients when needed.

ES APIs for IoT devices

Any IoT device that has Internet connection can be reprogrammed with an appropriate ES API software to let Alvin control them via TM protocol.