Voicepeak API

Regarding the title: more like 'Using VOICEPEAK via command line'.

References:

Apparently, neither AHS nor Dreamtonics bothers giving an extensive documentation / manual for using VOICEPEAK without the GUI. For comparison, VOICEVOX doesn't do great but at least you can get a grip scavenging through what they have in the repo.

So, half of this is translation of the aforementioned Japanese blogs, the other half is what I found out by trying.

Basic usage

./voicepeak.exe [OPTION..]

Starting with ./voicepeak.exe -h:

 -s, --say Text               Text to say
 -t, --text File              Text file to say
 -o, --out File               Path of output file
 -n, --narrator Name          Name of narrator, check --list-narrator
 -e, --emotion Expr           Emotion expression, for example:
                              happy=50,sad=50. Also check --list-emotion
     --list-narrator          Print narrator list
     --list-emotion Narrator  Print emotion list for given narrator
 -h, --help                   Print help
     --speed Value            Speed (50 - 200)
     --pitch Value            Pitch (-300 - 300)

One thing to notice is that the command line execution is languishedly slow, probably because an unseen GUI gets initiated and terminated every time voicepeak.exe is called. I don't know the details but it is painfully slow.

Some further breakdown on the options and arguments:

-s, --say Text: The Text part is essentially a string. Reference #1 says the maximum length is 140 characters (probably determined by Twitter, just a wild guess) so generally nothing comically long.

-t, --text File: Basically the same with -s, but reads a text file for the text to speak.

-o, --out File: Specifies the name (path) of the output file. If not supplied, the default is 'output.wav' at the current (shell) directory.

-n, --narrator Name: Specifies the narrator (character). I think this defaults to 'the first one in the narrator list' but I have only 1 narrator in Koharuri so I don't know.

-e, --emotion Expr: Emotion ratios. Only works if the narrator (manually selected or default) is compatible with the given emotions. Note that Koharuri has totally different emotion names than the standard 6 nameless voices.I'll list these in another section in this page

--list-narrator: Returns a list of available, locally installed Narrators. The names can and should be used when a Narrator is required as an additional argument, such as in -n, --narrator.

--list-emotion Narrator: Returns a list of all possible emotion handle / variable names / tags / you name it for the given Narrator. See some of the results below if you don't want to do this all the time.

-h, --help: Displays the help which is also quoted above.

--speed Value, -- pitch Value: Speech-related parameters.

Emotions

As of 2023/09/26 (VOICEPEAK v1.2.6)

Koharu Rikka

 hightension
 livid
 lamenting
 despising
 narration

Generic VOICEPEAK voices

The 'Japanese Male/Female 1/2/3' Voices. Also one called 'Japanese Female Child'.

 happy
 fun
 angry
 sad

Other characters

I don't have 'em so I don't know.

How to use, for example, in Python

Apparently when you're trying to use VOICEPEAK in command line, you're not using it really via command line.

VOICEPEAKをPythonから呼び出す provides a simple example of Python wrapper. For archive reasons I'll also steal the code and post it here.

import os
import subprocess
import winsound

def playVoicePeak(script , narrator = "Japanese Female 1", happy=50, sad=50, angry=50, fun=50):
    """
    任意のテキストをVOICEPEAKのナレーターに読み上げさせる関数
    script: 読み上げるテキスト（文字列）
    narrator: ナレーターの名前（文字列）
    happy: 嬉しさの度合い
    sad: 悲しさの度合い
    angry: 怒りの度合い
    fun: 楽しさの度合い
    """
    # voicepeak.exeのパス
    exepath = "C:/Program Files/VOICEPEAK/voicepeak.exe"
    # wav出力先
    outpath = "output.wav"
    # 引数を作成
    args = [
        exepath,
        "-s", script,
        "-n", narrator,
        "-o", outpath,
        "-e", f"happy={happy},sad={sad},angry={angry},fun={fun}"
    ]
    # プロセスを実行
    process = subprocess.Popen(args)

    # プロセスが終了するまで待機
    process.communicate()

    # 音声を再生
    winsound.PlaySound(outpath, winsound.SND_FILENAME)

    # wavファイルを削除
    os.remove(outpath)

This is based on the 6 nameless voices with their shared set of emotions. If you want to use for example Koharuri, you'll need to fix the emotion-related portions in this code.

I might do that later but currently it's as is and apparently can't correctly adjust Koharu Rikka's emotions.