Speech-like Emotional Sound Generator by WaveNet

Matsumoto, Kento; Hara, Sunao; Abe, Masanobu

doi:10.1109/APSIPAASC47483.2019.9023346

Permalink : https://ousar.lib.okayama-u.ac.jp/57781

ID	57781
フルテキストURL	Matsumoto_apsipa2019_132.pdf 1.1 MB
著者	Matsumoto, Kento Okayama University Hara, Sunao Okayama University ORCID Kaken ID publons researchmap Abe, Masanobu Okayama University ORCID Kaken ID publons researchmap
抄録	In this paper, we propose a new algorithm to generate Speech-like Emotional Sound (SES). Emotional information plays an important role in human communication, and speech is one of the most useful media to express emotions. Although, in general, speech conveys emotional information as well as linguistic information, we have undertaken the challenge to generate sounds that convey emotional information without linguistic information, which results in making conversations in human-machine interactions more natural in some situations by providing non-verbal emotional vocalizations. We call the generated sounds “speech-like”, because the sounds do not contain any linguistic information. For the purpose, we propose to employ WaveNet as a sound generator conditioned by only emotional IDs. The idea is quite different from WaveNet Vocoder that synthesizes speech using spectrum information as auxiliary features. The biggest advantage of the idea is to reduce the amount of emotional speech data for the training. The proposed algorithm consists of two steps. In the first step, WaveNet is trained to obtain phonetic features using a large speech database, and in the second step, WaveNet is re-trained using a small amount of emotional speech. Subjective listening evaluations showed that the SES could convey emotional information and was judged to sound like a human voice.
発行日	2019-11
出版物タイトル	Proceedings of APSIPA Annual Summit and Conference
巻	2019巻
出版者	IEEE
開始ページ	143
終了ページ	147
ISSN	2640-009X
資料タイプ	会議発表論文
言語	英語
著作権者	© Copyright APSIPA
イベント	APSIPA Annual Summit and Conference
イベント地	Lanzhou, China
イベント開催日	18-21 Nov. 2019
論文のバージョン	publisher
DOI	10.1109/APSIPAASC47483.2019.9023346