All our text 2 speech software supports both
SAPI4 and SAPI5 compliant voices.
SAPI5:
By default, XP and Vista has
Microsoft SAPI 5 installed with one voice: Sam (with XP) or Anna
(with Vista) but your "mileage may vary" !
If you would like to add the
Mike and Mary voices - get them HERE
Windows 98, ME, 2000 and NT
users...
To make use of the SAPI 5
voices - Download the SAPI 5 runtime files here >>>
this link
(6.22 Mb) then double click after download to install. If you
need / would like to have the Mike and Mary voices - get them HERE
SAPI4:
There
are a lot of free SAPI 4 voices on the internet, so SAPI 4 (used
in Windows 98, ME and 2000) is still very useful
You'll
need the SAPI4
runtime files and any Text 2 Speech engines you might
want - get
them HERE. After installed the software will automatically
recognize for selection.
But
maybe you want take advantage of using a supported language other
then English! Get any of the supported FREE text 2 speech engines
directly from Microsoft at
There are many advanced text
2 speech engines available on the INTERNET that are much better
then the free one's from Microsoft. The best, in my opinion, is
"AT&T Natural Voices" which sound / are much
better (convert t 2 s more efficiently).
Download it HERE
(ATT DTNV 1.4 Mike & ATT DTNV 1.4 Crystal - American English
voices) if you want but be advised it is a HUGH file (338+ Mb). If
you'd like it on CD - send me a 10 dollar bill (to cover postage
and creation of the CD) along with your request and mailing
address - I'll snail mail it to you.
Code-it Software - PO Box
171 - Midwest, WY 82643
Here are some samples of
text converted to audio (mp3 files) using the "Text 2
Audio" tool included within "Wave MP3 Editor" or
"Text Shifter" which is our most advanced text 2 audio
solution. Please keep in mind that if you were to convert the
text to a higher quality "sample rate" then these files
were (needed to convert to a lower quality rate to stream off the
net) that the sound quality would be much better (used to burn on
CD-R or put on your MP3 Player).
Must have Flash 7+
installed to view/ play.
Disclaimer: the text 2
speech engine, used for these recordings, is "AT&T
Natural Voices - Mike (English)" which sounds / are much
better (converts t 2 s more efficiently) then what is available
from Microsoft for free. The above was NOT edited in any way but
merely downloaded from the Project
Gutenberg and split into "chapters". Due to the
limitations of text 2 speech engines to recognize all text 100% -
you may have to edit text for good results. If you take creating
eBooks, or converting text 2 speech, seriously I would advise
using these AT&T engines.
ADVANCED INFO...
When using our "Text
Shifter", or any other text 2 speech applications, you might
want to learn how to add "scripting tags" within the
text (that is spoken or converted) via the text 2 speech engine.
While this is somewhat beyond the scope of the software - changing
the pitch or putting a pause in the voice output, as an example,
is easy as pie - once you understand the method.
The text 2 speech engine
determines the ability and implementation of these scripting tags
NOT our software.
The first thing you'll
have to know is what t2s engine you are using as SAPI4 and SAPI5
engines use a different set of scripting tags. All ready confused?
Here's how to determine..
* if you are using the
default voice that came with Windows XP or Vista you are using a
SAPI5 engine.
* if you are using the
AT&T Natural Voices t2s engine (download supplied on this
page) you are also using SAPI5.
* if you are using one of
the voices that we supply, within the setup, of Text Shifter, you
are using SAPI4.
* if you downloaded and
installed any engine from the above Microsoft link (free MS
engines) then you're using SAPI4 type engines.
* of course - the simplest
way to be sure is to add a "test" tag then if it doesn't
perform as expected - more then likely you need to use the
different set of "tags". Testing, debugging, testing...
that's always what's needed when dealing with any software!
SAPI4 Text 2 Speech
Engine Tags:
These (SAPI4) text 2 speech engines
support modifying speech output through special tags inserted in
the speech text string. These tags help you change the
characteristics of the output expression.
When using SAPI 4 engines/ voices -
our "Text Shifter" application WILL NOT record/ convert
to audio with the tags implemented: a "work a round"
would be to use an audio recorder to record the speaker output
(i.e. Stereo Mix, What U Hear, etc...) while playing the text
within the Text Shifter editor.
A simple example would be the following
sentence with a 3 second pause scripted into the sentence..
"A duck waddled right into the front
door \Pau=3000\ it sure surprise the cat!"
Please note
that not all voices support all TAGS.
Speech output tags use the following
rules of syntax:
* All tags begin and end with a
backslash character (\).
* The single backslash character is
not enabled within a tag. To include a backslash
character in a text parameter of a tag, use a double backslash
(\\).
* Tags are case-insensitive. For
example, \pit\ is the same as \PIT\.
* Tags are whitespace-dependent. For
example, \Rst\ is not the same as \ Rst \.
Unless otherwise specified or modified by
another tag, the speech output retains the characteristic set by
the tag within the text specified in a single Speak method.
Speech output is automatically reset through the user-defined
parameters after a Speak method is completed.
Chr Tag
Description
Sets the character of the voice.
Syntax
\Chr=string\
Part Description
string A string specifying the character of the voice.
"Normal" (Default) A normal tone of voice.
"Monotone" A monotone voice.
"Whisper" A whispered voice.
Example:
\chr="monotone"\
How are you today? \chr="whisper"\ I am fine.
\chr="normal"\ Good to hear!r.
Emp Tag
Description
Emphasizes the next word spoken. This tag must immediately precede
the word.
Syntax
\Emp\
Pau Tag
Description
Pauses speech for the specified number of milliseconds.
Syntax
\Pau=number\
Part Description
number The number of milliseconds to pause.
Pit Tag
Description:
This tag sets the baseline pitch of the voice to the specified
value in Hertz. The actual pitch fluctuates above and below this
baseline following the prosodic rules. Default is about 180-190 Hz
for a female voice and 100-110 Hz for a male voice.
Syntax:
\Pit=number\
Part Description
number The pitch in hertz.
Remark: This tag will not
work with all voices..
Example: \Pit=100\ are
you going home? \Pit=190\ are you going home?
Spd Tag
Description
Sets the baseline average talking speed of the speech output.
Syntax
\Spd=number\
Part Description
number Baseline average talking speed, in words per minute.
Example:
\Spd=150\I am speaking at a
rate of 150 words per minute. \Spd=75\ Am I talking too fast for
you?
Vce Tag (voice tag)
Description
Specifies the name of the voice, or NULL if the name is
unimportant. The Microsoft Voice engine can respond using the
following names (why they are named this way I don't have a clue -
you'll just need to experiment until you get what you want done
just right;-):
Peter
Sidney
Eager Eddie
Deep Douglas
Biff
Grandpa Amos
Melvin
Alex
Wanda
Julia
Syntax
\Vce=Speaker="Name of Voice"\
Example:
\Vce=Speaker="Wanda"\
Hello there Peter.
\Vce=Speaker="Peter"\
Hi Wanda. How are you?
\Vce=Speaker="Wanda"\
I'm not too good really. Got a bad head cold but I'll be fine by
tomorrow hopefully.
\Vce=Speaker="Alex"\
Hello there Biff.
\Vce=Speaker="Biff"\
Hi Alex. How are you?
\Vce=Speaker="Julia"\
He's not too good really. He's got a bad head cold and besides
that he needs a shower bad!
\Vce=Speaker=""\
Vol Tag
Description
Sets the baseline speaking volume of the speech output.
Syntax
\Vol=number\
Part Description
number Baseline speaking volume: 0 is silence and 65535 is maximum
volume.
Example:
\Vol=3000\
Remarks:
The volume setting affects both left and right channels. You
cannot set the volume of each channel separately. These tags are supported
only for TTS-generated output.
SAPI5 Text 2 Speech
Engine Tags:
These (SAPI5) text 2 speech
engines support modifying speech output through special tags
inserted in the speech text string. These tags help you change the
characteristics of the output expression.
When using SAPI 5
engines/ voices - our "Text Shifter" application WILL
record/ convert to audio with the tags implemented.
Simple example would be the following sentence with a 20 second
pause scripted into the sentence..
"Two seconds of silence
<silence msec="2000"/> has just occurred in this
sentence!"
Please
note that not all voices support all TAGS.
Some of the TAGs of greatest interest are documented below:
Volume
The Volume tag controls the volume of a voice. The tag can be
empty, in which case it applies to all subsequent text, or it can
have content, in which case it only applies to that content.
The Volume tag has one required attribute: Level. The value of
this attribute should be an integer between zero and one hundred.
Values outside of this range will be truncated.
<volume level="50">
This text should be spoken at volume level fifty.
<volume level="100">
This text should be spoken at volume level one hundred.
</volume>
</volume>
<volume level="80"/>
All text which follows should be spoken at volume level eighty.
One hundred represents the default volume of a voice. Lower values
represent percentages of this default. That is, 50 corresponds to
50% of full volume.
Values specified using the Volume tag will be combined with values
specified programmatically (using ISpVoice::SetVolume). For
example, if you combine a SetVolume( 50 ) call with a <volume
level="50"> tag, the volume of the voice should be
25% of its full volume.
Rate
The Rate tag controls the rate of a voice. The tag can be empty,
in which case it applies to all subsequent text, or it can have
content, in which case it only applies to that content.
The Rate tag has two attributes, Speed and AbsSpeed, one of which
must be present. The value of both of these attributes should be
an integer between negative ten and ten. Values outside of this
range may be truncated by the engine (but are not truncated by
SAPI). The AbsSpeed attribute controls the absolute rate of the
voice, so a value of ten always corresponds to a value of ten, a
value of five always corresponds to a value of five.
<rate absspeed="5">
This text should be spoken at rate five.
<rate absspeed="-5">
This text should be spoken at rate negative five.
</rate>
</rate>
<rate absspeed="10"/>
All text which follows should be spoken at rate ten.
Speed
The Speed attribute controls the relative rate of the voice. The
absolute value is found by adding each Speed to the current
absolute value.
<rate speed="5">
This text should be spoken at rate five.
<rate speed="-5">
This text should be spoken at rate zero.
</rate>
</rate>
Zero represents the default rate of a voice, with positive values
being faster and negative values being slower. Values specified
using the Rate tag will be combined with values specified
programmatically (using ISpVoice::SetRate).
Pitch
The Pitch tag controls the pitch of a voice. The tag can be empty,
in which case it applies to all subsequent text, or it can have
content, in which case it only applies to that content.
The Pitch tag has two attributes, Middle and AbsMiddle, one of
which must be present. The value of both of these attributes
should be an integer between negative ten and ten. Values outside
of this range may be truncated by the engine (but are not
truncated by SAPI).
The AbsMiddle attribute controls the absolute pitch of the voice,
so a value of ten always corresponds to a value of ten, a value of
five always corresponds to a value of five.
<pitch absmiddle="5">
This text should be spoken at pitch five.
<pitch absmiddle="-5">
This text should be spoken at pitch negative five.
</pitch>
</pitch>
<pitch absmiddle="10"/>
All text which follows should be spoken at pitch ten.
The Middle attribute controls the relative pitch of the voice. The
absolute value is found by adding each Middle to the current
absolute value.
<pitch middle="5">
This text should be spoken at pitch five.
<pitch middle="-5">
This text should be spoken at pitch zero.
</pitch>
</pitch>
Zero represents the default middle pitch for a voice, with
positive values being higher and negative values being lower.
Please
note: we have found that AT&T Natural Voices does NOT support
this "pitch" tag. Most other SAPI5 engines do however.
Spell
The Spell tag forces the voice to spell out all text, rather than
using its default word and sentence breaking rules, normalization
rules, and so forth. All characters should be expanded to
corresponding words (including punctuation, numbers, and so
forth). The Spell tag cannot be empty.
<spell>
These words should be spelled out.
</spell>
These words should not be spelled out.
Direct item insertion tags
Three tags are supported that applications the ability to insert
items directly at some level: Silence, Pron, and Bookmark.
Silence (pause)
The Silence tag inserts a specified number of milliseconds of
silence into the output audio stream. This tag must be empty, and
must have one attribute, Msec.
Five hundred milliseconds of silence <silence msec="500"/>
just occurred. Pron
The Pron tag inserts a specified pronunciation. The voice will
process the sequence of phonemes exactly as they are specified.
This tag can be empty, or it can have content. If it does have
content, it will be interpreted as providing the pronunciation for
the enclosed text. That is, the enclosed text will not be
processed as it normally would be.
The Pron tag has one attribute, Sym, whose value is a string of
white space separated phonemes.
<pron sym="h eh 1 l ow & w er 1 l d "/>
<pron sym="h eh 1 l ow & w er 1 l d"> hello
world </pron>
Voice
Change
If
you are using a SAPI5 engine that supports this tag - the voice
can be changed anywhere in the text: examples below:
<voice
required="name = Mike">
<voice
required="name = Crystal">
<voice
required="name = Sam">
Hello, I'm Sam and this is a
test of changing voices <voice required="name =
Mike"> Hi, I’m Mike! As you can hear the voice just
changed from gruffy ole Sam to me. This tag only applies to
certain SAPI5 text 2 speech engines.