eSpeak NG

Aditi Awasthi
4 min readSep 17, 2021

--

eSpeakNG is a compact, open-source, software speech synthesizer for Linux, Windows, and other platforms. It uses a formant synthesis method, providing many languages in a small size. Much of the programming for eSpeakNG’s language support is done using rule files with feedback from native speakers.

Because of its small size and many languages, it is included as the default speech synthesizer in the NVDA open source screen reader for Windows, as well as Android, Ubuntu and other Linux distributions. Its predecessor eSpeak was recommended by Microsoft in 2016 and was used by Google Translate for 27 languages in 2010; 17 of these were subsequently replaced by commercial voices.

The quality of the language voices varies greatly. Some languages have had more work or feedback from native speakers than others. Most of the people who have helped to improve the various languages are blind users of text-to-speech.

espeak-ng is a software speech synthesizer for English, and some other languages.

OPTIONS

  • -h, --help: Show summary of options.
  • --version: Prints the espeak library version and the location of the espeak voice data.
  • -f <text file>: Text file to speak.
  • --stdin: Read text input from stdin till to the end of a stream at once.

If neither -f nor — stdin are provided, then <words> from parameter are spoken, or text is spoken from stdin, read separately one line by line at a time.

  • -d <device>: Use the specified device to speak the audio on. If not specified, the default audio device is used.
  • -q: Quiet, don't produce any speech (may be useful with -x).
  • -a <integer>: Amplitude, 0 to 200, default is 100.
  • -g <integer>: Word gap. Pause between words, units of 10ms at the default speed.
  • -k <integer>: Indicate capital letters with: 1=sound, 2=the word "capitals", higher values = a pitch increase (try -k20).
  • -l <integer>: Line length. If not zero (which is the default), consider lines less than this length as end-of-clause.
  • -p <integer>: Pitch adjustment, 0 to 99, default is 50.
  • -s <integer>: Speed in words per minute, default is 175.
  • -v <voice name>: Use voice file of this name from espeak-ng-data/voices. A variant can be specified using +, such as af+m3.
  • -w <wave file name>: Write output to this WAV file, rather than speaking it directly.
  • --split=<minutes>: Used with -w to split the audio output into <minutes> recorded chunks.
  • -b: Input text encoding, 1=UTF8, 2=8 bit, 4=16 bit.
  • -m: Indicates that the text contains SSML (Speech Synthesis Markup Language) tags or other XML tags. Those SSML tags which are supported are interpreted. Other tags, including HTML, are ignored, except that some HTML tags such as <hr> <h2> and <li> ensure a break in the speech.
  • -x: Write phoneme mnemonics to stdout.
  • -X: Write phonemes mnemonics and translation trace to stdout. If rules files have been built with --compile=debug, line numbers will also be displayed.
  • -z: No final sentence pause at the end of the text.
  • --stdout: Write speech output to stdout.
  • --compile=voicename: Compile the pronunciation rules and dictionary in the current directory. =<voicename< is optional and specifies which language is compiled.
  • --compile-debug=voicename: Compile the pronunciation rules and dictionary in the current directory as above, but include line numbers, that get shown when -X is used.
  • --ipa: Write phonemes to stdout using International Phonetic Alphabet. --ipa=1 Use ties, --ipa=2 Use ZWJ, --ipa=3 Separate with _.
  • --tie=<character>: The character to use to join multi-letter phonemes in -x and --ipa output.
  • --path=<path>: Specifies the directory containing the espeak-ng-data directory.
  • --pho: Write mbrola phoneme data (.pho) to stdout or to the file in --phonout.
  • --phonout=<filename>: Write output from -x -X commands and mbrola phoneme data to this file.
  • --punct="<characters>": Speak the names of punctuation characters during speaking. If =<characters> is omitted, all punctuation is spoken.
  • --sep=<character>: The character to separate phonemes from the -x and --ipa output.
  • --voices[=<language code>]: Lists the available voices. If =<language code> is present then only those voices which are suitable for that language are listed. If xx-yy language code is passed, then voices with yy of xx language variants are shown with higher priority than just xx. If variant is passed, then all voice variants are shown. If mb or mbrola is passed, then all voices using the MBROLA voice synthesizer are shown. If all is passed, then all eSpeak NG voices, voice variants and MBROLA voices are shown.
  • --voices=<directory>: Lists the voices in the specified subdirectory.

EXAMPLES

  • espeak-ng "This is a test": Speak the sentence "This is a test" using the default English voice.
  • espeak-ng -f hello.txt: Speak the contents of hello.txt using the default English voice.
  • cat hello.txt | espeak-ng: Speak the contents of hello.txt using the default English voice.
  • espeak-ng -x hello: Speak the word "hello" using the default English voice, and print the phonemes that were spoken.
  • espeak-ng -ven-us "[[h@'loU]]": Speak the phonemes "h@'loU" using the American English voice.
  • espeak-ng --voices: List all voices supported by eSpeak.
  • espeak-ng --voices=en: List all voices that speak English (en).
  • espeak-ng --voices=mb: List all voices using the MBROLA voice synthesizer.

THANK YOU!

--

--