Voicepeak API
Regarding the title: more like 'Using VOICEPEAK via command line'.
References:
- https://atarms.hatenablog.com/entry/2023/03/12/164118
- https://takashiski.hatenablog.com/entry/2023/01/13/235249
Apparently, neither AHS nor Dreamtonics bothers giving an extensive documentation / manual for using VOICEPEAK without the GUI. For comparison, VOICEVOX doesn't do great but at least you can get a grip scavenging through what they have in the repo.
So, half of this is translation of the aforementioned Japanese blogs, the other half is what I found out by trying.
Basic usage
./voicepeak.exe [OPTION..]
Starting with
:
./voicepeak.exe -h
-s, --say Text Text to say -t, --text File Text file to say -o, --out File Path of output file -n, --narrator Name Name of narrator, check --list-narrator -e, --emotion Expr Emotion expression, for example: happy=50,sad=50. Also check --list-emotion --list-narrator Print narrator list --list-emotion Narrator Print emotion list for given narrator -h, --help Print help --speed Value Speed (50 - 200) --pitch Value Pitch (-300 - 300)
One thing to notice is that the command line execution is languishedly slow, probably because an unseen GUI gets initiated and terminated every time voicepeak.exe is called. I don't know the details but it is painfully slow.
Some further breakdown on the options and arguments:
-s, --say Text: The Text part is essentially a string. Reference #1 says the maximum length is 140 characters (probably determined by Twitter, just a wild guess) so generally nothing comically long.
-t, --text File: Basically the same with -s, but reads a text file for the text to speak.
-o, --out File: Specifies the name (path) of the output file. If not supplied, the default is 'output.wav' at the current (shell) directory.
-n, --narrator Name: Specifies the narrator (character). I think this defaults to 'the first one in the narrator list' but I have only 1 narrator in Koharuri so I don't know.
-e, --emotion Expr: Emotion ratios. Only works if the narrator (manually selected or default) is compatible with the given emotions. Note that Koharuri has totally different emotion names than the standard 6 nameless voices.I'll list these in another section in this page
--list-narrator: Returns a list of available, locally installed Narrators. The names can and should be used when a Narrator is required as an additional argument, such as in -n, --narrator.
--list-emotion Narrator: Returns a list of all possible emotion handle / variable names / tags / you name it for the given Narrator. See some of the results below if you don't want to do this all the time.
-h, --help: Displays the help which is also quoted above.
--speed Value, -- pitch Value: Speech-related parameters.
Emotions
As of 2023/09/26 (VOICEPEAK v1.2.6)
Koharu Rikka
hightension livid lamenting despising narration
Generic VOICEPEAK voices
The 'Japanese Male/Female 1/2/3' Voices. Also one called 'Japanese Female Child'.
happy fun angry sad
Other characters
I don't have 'em so I don't know.
How to use, for example, in Python
Apparently when you're trying to use VOICEPEAK in command line, you're not using it really via command line.
VOICEPEAKをPythonから呼び出す provides a simple example of Python wrapper. For archive reasons I'll also steal the code and post it here.
import os
import subprocess
import winsound
def playVoicePeak(script , narrator = "Japanese Female 1", happy=50, sad=50, angry=50, fun=50):
"""
任意のテキストをVOICEPEAKのナレーターに読み上げさせる関数
script: 読み上げるテキスト(文字列)
narrator: ナレーターの名前(文字列)
happy: 嬉しさの度合い
sad: 悲しさの度合い
angry: 怒りの度合い
fun: 楽しさの度合い
"""
# voicepeak.exeのパス
exepath = "C:/Program Files/VOICEPEAK/voicepeak.exe"
# wav出力先
outpath = "output.wav"
# 引数を作成
args = [
exepath,
"-s", script,
"-n", narrator,
"-o", outpath,
"-e", f"happy={happy},sad={sad},angry={angry},fun={fun}"
]
# プロセスを実行
process = subprocess.Popen(args)
# プロセスが終了するまで待機
process.communicate()
# 音声を再生
winsound.PlaySound(outpath, winsound.SND_FILENAME)
# wavファイルを削除
os.remove(outpath)
This is based on the 6 nameless voices with their shared set of emotions. If you want to use for example Koharuri, you'll need to fix the emotion-related portions in this code.
I might do that later but currently it's as is and apparently can't correctly adjust Koharu Rikka's emotions.
Also note that this example is using winsound for playback, and it's not cross-platform.
This is probably not the most time-efficient approach, but the major limitation comes from VOICEPEAK which doesn't have a stream output option (also understandable).
Thoughts
When limited to 140 words (characters, in fact, I guess),and if the upstream content (text) provider function / program can pre-process the text to speak, turn it into smaller chunks, short sentences, phrases etc., Command line VOICPEAK is not that slow.