Code-it Software Solutions

< Contact >

TEXT 2 SPEECH ENGINES AND SUPPORT:

Support: Text 2 Speech stuff...

All our text 2 speech software supports both SAPI4 and SAPI5 compliant voices.

SAPI5:

By default, XP, Vista, 7 and 8 has Microsoft SAPI 5 installed with one voice: Sam (with XP) or Anna (with Vista, 7 or 8) but your "mileage may vary"!

Windows 98, ME, 2000 and NT users...

To make use of the SAPI 5 voices - Download the SAPI 5 runtime files here >>>  this link (6.22 Mb) then double click after download to install.

SAPI4:

There are a lot of free SAPI 4 voices on the internet, so SAPI 4 (used in Windows 98, ME and 2000) is still very useful

You'll need the SAPI4 runtime files and any Text 2 Speech engines you might want. After installed - our/ the software will automatically recognize for selection.

But maybe you want take advantage of using a supported language other then English! There are many T2S engines, other then English (or the one that comes with your systems as the default engine) , but they are getting harder to find on the web - if you have a need  - here they are:

FREE TEXT-TO-SPEECH ENGINES

Microsoft Text-to-Speech Engines
     Mary, Mike, Sam and More (7.3MB)  Hot!

 L&H TruVoice TTS Engines
      American English (0.99MB)  Hot!
      British English (2.54MB)  Hot!
      Dutch (2.58MB)
      French (2.24MB)
      German (2.18MB)
      Italian (1.97MB)
      Japanese (3.00MB)
      Korean (3.03MB)
      Portuguese (2.39MB)
      Russian (2.85MB)
      Spanish (2.36MB)

Note: There are many advanced text 2 speech engines available on the INTERNET that are much better then the free one's from Microsoft. The best, in my opinion, is "AT&T Natural American English Voices" which sound / are much better (convert t 2 s more efficiently). 

ADVANCED INFO...

When using any of our 'Text 2 Speech' software - you might want to learn how to add "scripting tags" within the text (that is spoken or converted) via the text 2 speech engine. While this is somewhat beyond the scope of the software - changing the pitch or putting a pause in the voice output, as an example, is easy as pie - once you understand the method.


SAPI4 Text 2 Speech Engine Tags: 

These (SAPI4) text 2 speech engines support modifying speech output through special tags inserted in the speech text string. These tags help you change the characteristics of the output expression. 

A simple example would be the following sentence with a 3 second pause scripted into the sentence..

"A duck waddled right into the front door \Pau=3000\ it sure surprise the cat!"

Please note that not all voices support all TAGS.

Speech output tags use the following rules of syntax:

  • * All tags begin and end with a backslash character (\).
  • * The single backslash character is not enabled within a tag. To include a backslash character in a text parameter of a tag, use a double backslash (\\).
  • * Tags are case-insensitive. For example, \pit\ is the same as \PIT\.
  • * Tags are whitespace-dependent. For example, \Rst\ is not the same as \ Rst \.

Unless otherwise specified or modified by another tag, the speech output retains the characteristic set by the tag within the text specified in a single Speak method. Speech output is automatically reset through the user-defined parameters after a Speak method is completed.

Chr Tag
Description
Sets the character of the voice.
Syntax
\Chr=string\
Part Description
string A string specifying the character of the voice.

"Normal" (Default) A normal tone of voice.

"Monotone" A monotone voice.

"Whisper" A whispered voice.

Example:

\chr="monotone"\ How are you today? \chr="whisper"\ I am fine.
\chr="normal"\ Good to hear!r.



Emp Tag
Description
Emphasizes the next word spoken. This tag must immediately precede the word.
Syntax
\Emp\ 


Pau Tag
Description
Pauses speech for the specified number of milliseconds.
Syntax
\Pau=number\
Part Description
number The number of milliseconds to pause.


Pit Tag
Description:
This tag sets the baseline pitch of the voice to the specified value in Hertz. The actual pitch fluctuates above and below this baseline following the prosodic rules. Default is about 180-190 Hz for a female voice and 100-110 Hz for a male voice.
Syntax:
\Pit=number\
Part Description
number The pitch in hertz.

Remark: This tag will not work with all voices..

Example:  \Pit=100\ are you going home? \Pit=190\ are you going home?


Spd Tag
Description
Sets the baseline average talking speed of the speech output.
Syntax
\Spd=number\
Part Description
number Baseline average talking speed, in words per minute.

Example:

\Spd=150\I am speaking at a rate of 150 words per minute. \Spd=75\ Am I talking too fast for you?

Vce Tag (voice tag)
Description
Specifies the name of the voice, or NULL if the name is unimportant. The Microsoft Voice engine can respond using the following names (why they are named this way I don't have a clue - you'll just need to experiment until you get what you want done just right;-):
Peter
Sidney
Eager Eddie
Deep Douglas
Biff
Grandpa Amos
Melvin
Alex
Wanda
Julia 

Syntax
\Vce=Speaker="Name of Voice"\

Example:

\Vce=Speaker="Wanda"\
Hello there Peter.
\Vce=Speaker="Peter"\
Hi Wanda. How are you?
\Vce=Speaker="Wanda"\
I'm not too good really. Got a bad head cold but I'll be fine by tomorrow hopefully.

\Vce=Speaker="Alex"\
Hello there Biff.
\Vce=Speaker="Biff"\
Hi Alex. How are you?
\Vce=Speaker="Julia"\
He's not too good really. He's got a bad head cold and besides that he needs a shower bad!

\Vce=Speaker=""\

Vol Tag
Description
Sets the baseline speaking volume of the speech output.
Syntax
\Vol=number\
Part Description
number Baseline speaking volume: 0 is silence and 65535 is maximum volume.

Example:

\Vol=3000\

Remarks:

The volume setting affects both left and right channels. You cannot set the volume of each channel separately. These tags are supported only for TTS-generated output.


SAPI5 Text 2 Speech Engine Tags: 

These (SAPI5) text 2 speech engines support modifying speech output through special tags inserted in the speech text string. These tags help you change the characteristics of the output expression. 

"Two seconds of silence <silence msec="2000"/> has just occurred in this sentence!"

Please note that not all voices support all TAGS.

Some of the TAGs of greatest interest are documented below:

Volume
The Volume tag controls the volume of a voice. The tag can be empty, in which case it applies to all subsequent text, or it can have content, in which case it only applies to that content.

The Volume tag has one required attribute: Level. The value of this attribute should be an integer between zero and one hundred. Values outside of this range will be truncated.

<volume level="50">
This text should be spoken at volume level fifty.

<volume level="100">
This text should be spoken at volume level one hundred.
</volume>

</volume>

<volume level="80"/>
All text which follows should be spoken at volume level eighty.

One hundred represents the default volume of a voice. Lower values represent percentages of this default. That is, 50 corresponds to 50% of full volume.

Values specified using the Volume tag will be combined with values specified programmatically (using ISpVoice::SetVolume). For example, if you combine a SetVolume( 50 ) call with a <volume level="50"> tag, the volume of the voice should be 25% of its full volume.

Rate
The Rate tag controls the rate of a voice. The tag can be empty, in which case it applies to all subsequent text, or it can have content, in which case it only applies to that content.

The Rate tag has two attributes, Speed and AbsSpeed, one of which must be present. The value of both of these attributes should be an integer between negative ten and ten. Values outside of this range may be truncated by the engine (but are not truncated by SAPI). The AbsSpeed attribute controls the absolute rate of the voice, so a value of ten always corresponds to a value of ten, a value of five always corresponds to a value of five.

<rate absspeed="5">
This text should be spoken at rate five.
<rate absspeed="-5">
This text should be spoken at rate negative five.
</rate>
</rate>
<rate absspeed="10"/>

All text which follows should be spoken at rate ten.

Speed
The Speed attribute controls the relative rate of the voice. The absolute value is found by adding each Speed to the current absolute value.

<rate speed="5">
This text should be spoken at rate five.
<rate speed="-5">
This text should be spoken at rate zero.
</rate>
</rate>

Zero represents the default rate of a voice, with positive values being faster and negative values being slower. Values specified using the Rate tag will be combined with values specified programmatically (using ISpVoice::SetRate).

Pitch
The Pitch tag controls the pitch of a voice. The tag can be empty, in which case it applies to all subsequent text, or it can have content, in which case it only applies to that content.

The Pitch tag has two attributes, Middle and AbsMiddle, one of which must be present. The value of both of these attributes should be an integer between negative ten and ten. Values outside of this range may be truncated by the engine (but are not truncated by SAPI).

The AbsMiddle attribute controls the absolute pitch of the voice, so a value of ten always corresponds to a value of ten, a value of five always corresponds to a value of five.

<pitch absmiddle="5">
This text should be spoken at pitch five.
<pitch absmiddle="-5">
This text should be spoken at pitch negative five.
</pitch>
</pitch>
<pitch absmiddle="10"/>

All text which follows should be spoken at pitch ten.

The Middle attribute controls the relative pitch of the voice. The absolute value is found by adding each Middle to the current absolute value.

<pitch middle="5">
This text should be spoken at pitch five.
<pitch middle="-5">
This text should be spoken at pitch zero.
</pitch>
</pitch>

Zero represents the default middle pitch for a voice, with positive values being higher and negative values being lower.

Please note: we have found that AT&T Natural Voices does NOT support this "pitch" tag. Most other SAPI5 engines do however.

Spell
The Spell tag forces the voice to spell out all text, rather than using its default word and sentence breaking rules, normalization rules, and so forth. All characters should be expanded to corresponding words (including punctuation, numbers, and so forth). The Spell tag cannot be empty.

<spell>
These words should be spelled out.
</spell>
These words should not be spelled out.

Direct item insertion tags
Three tags are supported that applications the ability to insert items directly at some level: Silence, Pron, and Bookmark.

Silence (pause)
The Silence tag inserts a specified number of milliseconds of silence into the output audio stream. This tag must be empty, and must have one attribute, Msec.

Five hundred milliseconds of silence <silence msec="500"/> just occurred.
Pron
The Pron tag inserts a specified pronunciation. The voice will process the sequence of phonemes exactly as they are specified. This tag can be empty, or it can have content. If it does have content, it will be interpreted as providing the pronunciation for the enclosed text. That is, the enclosed text will not be processed as it normally would be.

The Pron tag has one attribute, Sym, whose value is a string of white space separated phonemes.

<pron sym="h eh 1 l ow & w er 1 l d "/>
<pron sym="h eh 1 l ow & w er 1 l d"> hello world </pron>

Voice Change

If you are using a SAPI5 engine that supports this tag - the voice can be changed anywhere in the text: examples below:

<voice required="name = Mike">

<voice required="name = Crystal">

<voice required="name = Sam">

Hello, I'm Sam and this is a test of changing voices <voice required="name = Mike"> Hi, Iím Mike! As you can hear the voice just changed from gruffy ole Sam to me. This tag only applies to certain SAPI5 text 2 speech engines.