Academic works

From Novoyuuparosk Wiki
Revision as of 10:13, 5 October 2023 by Mikkeli (talk | contribs) (Created page with "As a current doctoral course student at [ Human-Computer Interaction Laboratory], Hokkaido University, I '''am not very proud''' '''to say''' that I have no publications as of now (whenever you are looking at this page, that is). However, I feel that it is equally important to introduce what I am currently doing and what I have done in the slight case that you are interested. == Language-independent speech recognition == Modern speech recognition, or...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

As a current doctoral course student at Human-Computer Interaction Laboratory, Hokkaido University, I am not very proud to say that I have no publications as of now (whenever you are looking at this page, that is).

However, I feel that it is equally important to introduce what I am currently doing and what I have done in the slight case that you are interested.

Language-independent speech recognition

Modern speech recognition, or automatic speech recognition usually depends on a language-specific Hidden Markov Model for application-level accuracy. The language independence of speech recognition is about the step before probabilistic models - converting audio (waveforms) into signal sequences that then goes into the probabilistic model.

Several approaches exist. To be very honest I haven't read a lot of literature so I will not brag about what is commonplace. Here I will simply introduce how I think about it.


Usually when an ASR system is built around a target language, the interim state, which is phonemes, are also determined by the language in question. For example a Japanese ASR might only classify vowels into 5 possibilities in A/I/U/E/O (not in their finest IPA form, but you get the idea if you know one tad of Japanese).

This haven't proved to be a fundamental shortcoming of ASR when dealing with accented or multi-lingual situations. However, there exist a niche for accurate accent reproduction and thus precise transcription. The niche actually comes from me and I have always struggled to prove that anyone else really needed this.

To put things more straightforward, I want to recreate heavily accented singings with vocal synthesisers such as CeVIO, Synthesizer V, or VOCALOID, or you name it. These synthesisers usually have 'voice banks' which are capable of making sounds under phoneme notation for one or several (in the case of SynthV) languages.