Academic works: Difference between revisions

From Novoyuuparosk Wiki
(Created page with "As a current doctoral course student at [https://hci-lab.jp/ Human-Computer Interaction Laboratory], Hokkaido University, I '''am not very proud''' '''to say''' that I have no publications as of now (whenever you are looking at this page, that is). However, I feel that it is equally important to introduce what I am currently doing and what I have done in the slight case that you are interested. == Language-independent speech recognition == Modern speech recognition, or...")
 
No edit summary
 
(5 intermediate revisions by the same user not shown)
Line 1: Line 1:
As a current doctoral course student at [https://hci-lab.jp/ Human-Computer Interaction Laboratory], Hokkaido University, I '''am not very proud''' '''to say''' that I have no publications as of now (whenever you are looking at this page, that is).
I am Wan Ziyu, a current doctoral course student at [https://hci-lab.jp/ Human-Computer Interaction Laboratory], Hokkaido University. I have relatively formal education on opto-electrical engineering and some computer science, and informal self-education on linguistics and phonetics.
 
I '''am not very proud''' '''to say''' that I have no publications as of now (whenever you are looking at this page, that is).


However, I feel that it is equally important to introduce what I am currently doing and what I have done in the slight case that you are interested.
However, I feel that it is equally important to introduce what I am currently doing and what I have done in the slight case that you are interested.


== Language-independent speech recognition ==
== Research strengths and, well, abilities ==
Modern speech recognition, or automatic speech recognition usually depends on a language-specific Hidden Markov Model for application-level accuracy. The language independence of speech recognition is about the step before probabilistic models - converting audio (waveforms) into signal sequences that then goes into the probabilistic model.
I don't have any specifically notable certificates or qualifications so take everything I list here with a pinch of salt.
 
* Digital signal processing, with Python (librosa / numpy / scipy) and a some C++ (iPlug2 for creating VST apps)
* Neural network (rather basic) with Tensorflow / Keras. Also a tad of PyTorch, but I hate migrating between toolkits.
* Some HTML / JavaScript for unpretty utility webpages.
* Some bash / Python for task automation.
* LLM prompt composition, generic LLM utilisation.
* Basic welding and electrician skills.
 
== Research topics / interests ==
 
=== Language-independent speech recognition ===
To capture the phonemes (sound) rather than text, a language-independent speech recognition system is proposed.
 
''See: [[Language-independent speech recognition]]''


Several approaches exist. To be very honest I haven't read a lot of literature so I will not brag about what is commonplace. Here I will simply introduce how I think about it.
=== LLM powered conversational (car) navigation agent ===
Based on the half-faith, half-fact (proportions may vary) that vehicle navigation is a social, conversational task, I think it might be good to enable navigation softwares to have some conversation with the driver as well.


=== Purpose ===
''See: [[LLM navigation agent]]''
Usually when an ASR system is built around a target language, the interim state, which is phonemes, are also determined by the language in question. For example a Japanese ASR might only classify vowels into 5 possibilities in A/I/U/E/O (not in their finest IPA form, but you get the idea if you know one tad of Japanese).


This haven't proved to be a fundamental shortcoming of ASR when dealing with accented or multi-lingual situations. However, there exist a niche for accurate accent reproduction and thus precise transcription. The niche actually comes from me and I have always struggled to prove that anyone else really needed this.
=== (Pro?)active agent guidance ===
A spin-off from the navigator idea. Make the agent capable of actively asking for information might help the human user form more structured, concise and solid input.


To put things more straightforward, I want to recreate heavily accented singings with vocal synthesisers such as CeVIO, Synthesizer V, or VOCALOID, or you name it. These synthesisers usually have 'voice banks' which are capable of making sounds under phoneme notation for one or several (in the case of SynthV) languages.
''See: [[Active agent guidance]]''

Latest revision as of 07:34, 11 October 2023

I am Wan Ziyu, a current doctoral course student at Human-Computer Interaction Laboratory, Hokkaido University. I have relatively formal education on opto-electrical engineering and some computer science, and informal self-education on linguistics and phonetics.

I am not very proud to say that I have no publications as of now (whenever you are looking at this page, that is).

However, I feel that it is equally important to introduce what I am currently doing and what I have done in the slight case that you are interested.

Research strengths and, well, abilities

I don't have any specifically notable certificates or qualifications so take everything I list here with a pinch of salt.

  • Digital signal processing, with Python (librosa / numpy / scipy) and a some C++ (iPlug2 for creating VST apps)
  • Neural network (rather basic) with Tensorflow / Keras. Also a tad of PyTorch, but I hate migrating between toolkits.
  • Some HTML / JavaScript for unpretty utility webpages.
  • Some bash / Python for task automation.
  • LLM prompt composition, generic LLM utilisation.
  • Basic welding and electrician skills.

Research topics / interests

Language-independent speech recognition

To capture the phonemes (sound) rather than text, a language-independent speech recognition system is proposed.

See: Language-independent speech recognition

LLM powered conversational (car) navigation agent

Based on the half-faith, half-fact (proportions may vary) that vehicle navigation is a social, conversational task, I think it might be good to enable navigation softwares to have some conversation with the driver as well.

See: LLM navigation agent

(Pro?)active agent guidance

A spin-off from the navigator idea. Make the agent capable of actively asking for information might help the human user form more structured, concise and solid input.

See: Active agent guidance