The Complete Guide On Text-To-Speech (TTS) Software

The Complete Guide to Text-to-Speech Software: Why Use It? Where to get it for free? In addition to TTS support in the best authoring tools

You are familiar with computer generated voices. Our phones now speak to us with voices that sound almost human, so much so that we can have simple conversations with them that sound natural. We are now using devices in our homes like Alexa, Cortana and Google Home, which are doing the same thing, and it doesn't seem to matter so much that they are always listening. Even some of our kitchen appliances and cars seem to have found a voice. And it's certainly easier for me to send a text message without errors if I dictate the text to my phone with my voice instead of manipulating the on-screen keyboard with my big thumbs!

Of course, each of those devices tends to use only one voice, and in most cases has a limited vocabulary, so it's not difficult to optimize each one to sound its best.

We used to consider computer voices, especially text-to-speech (TTS), primarily useful for the visually impaired. In eLearning, when we use narration, we usually record our own voices or use expert voice-over artists. The latter make eLearning lessons sound professional and generally worthwhile, just like a documentary narrated by Morgan Freeman or David Attenborough can be fascinating, while the same script read by Gilbert Gottfried can be quite annoying. However, did you know that you can use professional storytellers at a much lower cost by adopting TTS? Keep reading.

Why use TTS voices even when you intend to use human storytellers?

Script change

Scripts are never set in stone at the beginning of a project. They are usually modified quite frequently before the project is considered complete. One of the reasons is that when we read a script, it may sound very different in our heads than when we listen to it out loud. This only leads to script changes when we notice phrases that sound awkward or are otherwise unclear.

There are other reasons, of course, such as realizing that you need to define acronyms the first time you use them, or determine that the narration is unnecessarily long. As native speakers, we read much faster than we speak, so what may seem reasonable when reading may seem very long when listening.

Professional storytellers are generally paid per minute with a minimum number of minutes per session. Every time there is a script change and you need to go back to the narrator for an update, it will cost you. These costs can add up to much more than originally budgeted. Even if you are recording your own voice, you will find yourself re-recording often. Each time, you will also need to ensure that your environment is the same: quiet, the same microphone levels, the same microphone distance, etc.

Therefore, for these and other reasons, I will use TTS voices with the client's full understanding of the reasons. After making all the script changes and before the project is complete I will let clients choose the voice or voices of the professional narrator, send the final scripts only once to the narrators, then replace the TTS audio with the narrator audio quickly. The narration will be consistent, correct and at the lowest cost.

However, some clients …

Some of my clients like TTS voices when they hear them, and prefer to continue to use them in the final product as well. Certainly, using TTS leads to lower costs and easier updates in the future. In fact, although most TTS voices will not be mistaken for a true human, they sound better than in the past, and much better than those built into our operating systems. Let's look at the following.

TTS voices included in Windows and Macintosh

Modern operating systems include one or more voices that we can use in eLearning. However, they generally don't sound as good. In Windows 10, for example, you can find the voice selection in Figure 1 when you go to Control Panel> Ease of Access> Narrator> Voice.

Figure 1. Windows 10 Narrator Voice Options

You may also find that you can set options for the voice you choose, as seen in Figure 2.

Figure 2. Windows 10 narration options

Figure 2. Windows 10 narration options

Similarly, on a Macintosh with OS X 10.6.8, you will find options in System Preferences> System> Universal Access> VoiceOver> Voiceover Utility. In later versions of OS X, the location may differ slightly. Figure 3 shows the wide variety of system voices on a Mac, though I only find that Vicki's voice is marginally acceptable. Figure 4 shows how you can assign different voices and set parameters for each for different computer functions.

Figure 3. Mac OS voice options

Figure 3. Mac OS voice options

Figure 4. Mac OS X narration options

Figure 4. Mac OS X narration options

The 3 most important authoring tools that have the largest market share are Adobe Captivate 2017, Articulate Storyline 360 ​​and Trivantis Lectora 17.

The 3 tools differ in TTS offerings.

  1. Adobe captivate has always included TTS features.
  2. Articulate TTS introduced in Storyline 360 last november
  3. Trivantis Lectora Does not yet include TTS. However, keep reading, because there is a solution for those of you who use Lectora.

Adobe Captivate TTS

Voices included

Captivate has included TTS voices since version 4 in January 2009. In Captivate 2017, seven NeoSpeech licensed voices are included, as seen in Figure 5.

Figure 5. Captivate 2007 includes TTS voices

Figure 5. Captivate 2007 includes TTS voices

  • British English: Bridget
  • French: Chloe
  • Korean: Yumi
  • US English USA: James, Julie, Kate and Paul

Note that if you send a text in French to Chloe, she will speak it correctly in French, although it sounds more like Canadian French than Parisian. Similarly, if you make Yumi speak text written in Korean, it will sound like native Korean to a Korean speaker. On the other hand, if you make Chloe or Yumi speak English text, they will sound like a French person who speaks English with a strong French accent or a Korean person who speaks English with a strong Korean accent, respectively.

Generating the TTS audio

There are a couple of ways to create TTS-generated narrations in Captivate, and they're both quick and easy.

1. Add TTS to any or all of the slides in one place by selecting Audio> Voice Management. You can paste the scripts from somewhere else or write them directly. See Figure 6.

Figure 6. Speech management in Captivate

Figure 6. Speech management in Captivate

2. For any slide, you can enter a script in your Slide Notes, then check the box to copy it to the text-to-speech location. (You can also check another box to make the Slide Notes subtitled.) See Figure 7.

Figure 7. Slide Notes Option

Figure 7. Slide Notes Option

Notice in the previous two figures that you can format the text using bold, italic, underline, and colors. With Captivate's Export Captions to Word option, the formatting is preserved, so if you want to deliver the Word document later to a professional narrator, the formatting can indicate where to emphasize certain phrases or words. See Figure 8.

1586895290 770 The Complete Guide On Text To Speech TTS Software

Figure 8. Word export

Mix and match voices

An important feature among Captivate's TTS skills is that you can have as many scripts as you want in the order you want, on any slide. This means that you can have a conversation between two or more people, like the one you see in Figure 9. The result will be an audio track on Slide 1 that contains the conversation in the order shown. Then you can edit the track using Captivate's built-in audio editor if you like. For example, while Captivate will add a natural pause between different parts of the conversation, you can change the length of individual pauses if you want.

Figure 9. A conversation

Figure 9. A conversation

You can also use this option to use different voices when more than one language is used within the same text. For example, see Figure 10.

Figure 10. TTS language mix

Figure 10. TTS language mix

Using optional VoiceText meta tags

In addition to formatting text, you can also use VoiceText ™ Meta Tags (VTML) to control speech in many ways, summarized below. For more details, download this guide.

1. Jumps

Establish a level of rest.
read continuously
read with a little break
read with great rest
sentence separation

2. Parts of the speech

Indicate the part of the speech for the next word.
<vtml_partofsp part = "unknown" | "noun" | "verb" | "modifier" |
"function" | "interjection">
text

There are many English words that are both a noun and a verb and, in each case, are pronounced differently. For example, read this paragraph out loud and you will notice the differences in pronunciations.

We are going to record a disc. We will reject the garbage. We will progress until we make a lot of progress. For the party, we will produce many products. Don't discuss the contest! If you rebel, you are a rebel. We will not subject you to such a boring subject. I'm going to contrast the contrast function of these different televisions.

Surprisingly, NeoSpeech will pronounce each of the nouns and verbs in the previous sentence correctly. However, there are many more examples of English nouns and verbs that are spelled the same but are pronounced differently. If you ever find that a word is not pronounced correctly because NeoSpeech does not understand the speech part of that word from the sentence structure, you can tell which part of the speech is using this tag.

3. Pause

Pause for the indicated milliseconds.

4. Phonetic symbols

(see PDF link above for more details)

<vtml_phoneme
ph = "string"
alphabet = "ipa" | "x-cmu" | "x-pentax" | "x-sapi" | "x-sampa" |
"x-worldbet" | "x-pinyin">
text

5. Pitch

Sets the tone for the displayed text.

text

6. Say how

(see PDF for more details)

Sets the format for the text.

text

7. Sub

Allows you to define alternative text to read for a text passage.

text

8. Volume

Set the volume from 0 to 500%.

text

Important NeoSpeech Utilities

In addition to allowing you to use VTML tags, the NeoSpeech folders installed as part of Captivate allow you to customize in advance how acronyms or whatever industrial term you use should be pronounced. It does it only once, and then the word or phrase will always be pronounced correctly.

In Windows, the Adobe Captivate Voices 2017 x64 / VT / program files folder is where the voice folders can be found. See Figure 11.

Figure 11. The location of NeoSpeech voice folders

Figure 11. The location of the NeoSpeech voice fold

Inside each of the previous folders there is a subfolder called M16, which in turn contains three folders. One of them is the bin folder, which contains two executable files:

UserDicEng.exe allows you to define exactly how words should be pronounced. See Figure 12.

Figure 12. Editing the voice dictionary

Figure 12. Editing the voice dictionary

You can add as many words as you like and, in each case, define the pronunciation using alphabetic characters, as in the case of Figure 13, which indicates that ABSlider must be pronounced as AB slider and not like abslider.

Figure 13. Add or modify a word

Figure 13. Add or modify a word

You can also use pronunciation symbols when the alpha characters are not enough, as in Figure 14.

Figure 14. Use of pronunciation symbols

Figure 14. Use of pronunciation symbols

VTEditor_ENG.exe: Allows you to play entire text passages in the voice you select. See Figure 15. While it is useful for testing whether certain words will be pronounced incorrectly, after which you can take steps to fix it using the Dictionary Engine above, unfortunately you cannot generate the audio files from this application (the Wave option is disabled).

Figure 15. The VoiceText application

Figure 15. The VoiceText application

In Captivate, the quickest way to change TTS voices for your own voice files, or for files sent to you by a professional narrator, is to open the Audio section of the library, click on each audio file there, and click Import to replace audio with professional audio This usually takes a few minutes, even if you have a lot of audio files.

Articulate Storyline 360 ​​TTS

In November 2017, Articulate added TTS capability to Storyline 360. Storyline 3 does not include this feature at this time. However, you can open Storyline 360 ​​files that contain text-to-speech audio in Storyline 3 without losing the audio, although you can't create a new text-to-speech audio narration or make any changes to the existing narration in Storyline 3 .

Voices included

Storyline uses the Amazon Polly text-to-speech engine, so you currently have access to the languages ​​you see in Figure 16. You can always see the updated list of languages ​​here.

Figure 16. Storyline 360 ​​Voices

Figure 16. Storyline 360 ​​Voices

Generating the TTS audio

You can add TTS to a slide using Insert> Audio> Text to Speech. See Figure 17. Once you have inserted the generated speech audio, do not use this option if you need to modify the script text or change the speech used. The dialog you see will appear empty every time. Instead, right-click the audio track that the TTS generated to return to your script.

You can also paste the text in the Slide Notes for the current slide here by clicking a button. There are no options to format the text.

Figure 17. Storyline text-to-speech option

Figure 17. Storyline text-to-speech option

Creating conversations

You cannot create a conversation between voices directly in the TTS window. However, since Storyline allows multiple audio tracks, you could generate three separate audio tracks and concatenate them on the timeline, as you can see in Figure 18. This has the advantage of allowing you to time the two sides of the conversation with more space between each part if desired, to insert other screen events in between.

Alternatively, you can export all three tracks and then combine them into one track in an external editor or in Storyline's own audio editor. This can be very time consuming, especially if you have multiple slides where you need to have conversations between doctor and patient, lawyer and client, or any other situation.

Figure 18. Audio concatenation

Figure 18. Audio concatenation

Being a recent addition to Storyline 360, there are understandably other limitations to its text-to-speech capabilities compared to long-standing TTS in Adobe Captivate:

  1. You can't enter all the narration for each slide in one place. You must create or paste the scripts on each slide, then generate the audio for that slide.
  2. There is no way to preset word pronunciations in a voice dictionary. If a word is pronounced incorrectly, you should modify the script so that it sounds as it should. For example, in the case of my last name (Ganci), I would need to write it phonetically (or have an Italian voice say it) so that it sounds correct every time I use it instead of just defining its pronunciation once. After trying GHAN-chee, GAN-chee, GAHN-chee and various other combinations, I stopped making the voice of the United States pronounce my name correctly because each time I insisted on pronouncing the A in my name as the A in CAN in place of the A in FATHER. However, when I wrote KHAN-chee, he pronounced the A correctly, although of course the K is wrong.
  3. While Amazon Polly voices allow for Speech Synthesis Markup Language (SSML) tagging, that's not currently possible in Storyline. Therefore, you cannot change the tone, volume, or other aspects of your text to speech. You will have to edit the generated audio later to make these kinds of changes.
  4. In the Storyline TTS dialog box, you can preview each voice, in which case you will hear the voice presented by name, and then say it will read any text. However, you cannot preview the voice reading your script. You must insert it into your timeline before you can hear the voice reading the script.

The above limitations aside, I'm glad to see that Storyline 360 ​​now includes text-to-speech features. One feature that captivates Captivate is the number of voices available through Amazon Polly. See below for more.

Other ways to get TTS audio

Most authoring tools, like Trivantis Lectora, do not include TTS voices. Also, you may want to have options other than what you find in Adobe Captivate and Articulate Storyline. You'll be glad to know that there are other online options for creating and downloading TTS narrations, though please note that this process is more laborious as you will have to create the online audio files individually, download them and then insert them into your lesson file

1. Amazon Polly Free and paid

Remember that Articulate Storyline 360 ​​uses Amazon Polly voices. Figure 16 shows the complete list of available voices. However, if you don't use Storyline 360, you can still use these voices.

You can generate up to 5 million characters per month for voice at no cost. That's 1,640 single-spaced pages of text / 90 hours of narration! It is unlikely that you need that much. However, if you do, after that it costs $ 4.00 per month for 1 million characters.

2. Text-to-speech – Free

This site includes the following voices:

American English: Alice, Daisy, George, Jenna and John
British English: Emma and Harry
French: Gabriel and Jade
Spanish: Isabella and Mateo
German: Michael and Nadine
Italian: Alessandra and Giovanni
Portuguese: Rodrigo
Russian: Valentin

In each case, you can choose between slow, medium, fast, and very fast speeds.

Type or paste your text into the box provided and click the Create Audio File button. When you are finished, you can download the resulting MP3 file and insert it into your learning.

3. Natural readers: free and paid

Free and paid for personal use

The free version of Natural Readers is for personal use only and includes up to 20 minutes per day for Premium Voices and there is no limit to the Free Voices used. You can paste or write your scripts, or upload PDF, DocX, RTF, TXT and ePub documents. In each case, you can choose between speeds ranging from -4 to 9 and you can download the resulting MP3 files.

Free includes 20 minutes per day. Voices include American English, British English, French (France and Canada), Spanish (Spain, Mexico, Spanish and American), German, Italian, Portuguese (Portugal and Brazil), Swedish and Dutch.

For $ 60 a year, there's no time limit, though it's limited to 1 million characters a year and has access to 57 premium voices. For $ 72 a year, you can convert up to 5 million characters every year.

Commercial use

To use the TTS audio created in Natural Readers for eLearning, you can try the commercial version for free for a limited time, after which you can license a single user for $ 588 a year, four users for $ 948 a year , or it can go from month to month for $ 99 a month for a single user.

The commercial version includes access to 47 high-quality voices from 24 different languages: English (US, Australian, British, Indian and Welsh), French (France, Canada), Spanish (Spanish, American), Portuguese (Portugal, Brazil), Welsh, Danish, Dutch, German, Icelandic, Italian, Japanese, Polish, Romanian, Russian, Swedish, Turkish, and Norwegian. These are different voices from those included in the free version. It also includes a pronunciation editor, an audio editor and SSML tags.

Software for natural readers

This site also includes software that you can download for Windows and Macintosh for personal use only. There is a free version and there are paid versions ranging from $ 70 to $ 200, a one-time fee. See what's included here.

4. IBM Watson services: free and paid

As part of its Watson services, IBM provides text-to-speech. It is based on IBM Cloud, which includes many other services, and you can download the resulting MP3 files. Their voices include multiple languages. Under the Lite plan, which has no cost, the first 100 minutes per month are free. the Standard the plan has tiers, ranging from $ 0.01 to $ 0.02 US per minute. the Cousin The plan offers high-end services and you will need to contact IBM for pricing here.

5. iSpeech – Paid

iSpeech offers multiple languages: English (US, UK, Australia, Canada), Mandarin, Hong Kong Cantonese, Japanese, Korean, Hungarian, Portuguese (Portugal, Brazil), Spanish, Catalan, Czech, Danish, Finnish, French (France, Canada), Norwegian, Dutch, Polish, Italian, Turkish, Greek, German, Russian, Arabic, and Swedish. This is one of the most expensive services, but it can be worth it if you accept that your voices are worth it. Prices range from $ 500 per 10,000 words (5 cents per word) to $ 2,500 per 100,000 words (2 cents per word).

6. NeoSpeech – Paid

NeoSpeech, the engine used by Adobe Captivate that includes unlimited use of seven voices in US English, French, and Korean. USA And the UK, among others, also offers an online service to convert text to speech separately so you can use it wherever you want. The price is also higher than most: $ 25 for 400 words (6 cents per word), $ 50 for 1,000 words (5 cents per word), and $ 100 for 2,400 words (4 cents per word). More than 40 languages ​​are available, including English (US and UK), Spanish, French (France and Canada), Portuguese, Italian, German, Korean, Japanese, Mandarin, Cantonese, Taiwanese, and Thai.

Convinced?

Are you convinced that TTS can be useful for you? Using the TTS options built into Captivate and Storyline may suffice. If not, you can also create TTS voices online, many at no cost. Unless you plan to use high quality TTS voices for the final version of your eLearning, you can probably do it with lower quality voices for your eLearning drafts until the scripts are finalized, after which you can change the TTS voices in favor of the professional storyteller or your own voice files.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top