フルテキストURL Proc_Interspeech_2018_2464.pdf
著者 Murakami, Hiroki| Hara, Sunao| Abe, Masanobu| Sato, Masaaki| Minagi, Shogo|
抄録 In this paper, we propose an algorithm to improve the naturalness of the reconstructed glossectomy patient's speech that is generated by voice conversion to enhance the intelligibility of speech uttered by patients with a wide glossectomy. While existing VC algorithms make it possible to improve intelligibility and naturalness, the result is still not satisfying. To solve the continuing problems, we propose to directly modify the speech waveforms using a spectrum differential. The motivation is that glossectomy patients mainly have problems in their vocal tract, not in their vocal cords. The proposed algorithm requires no source parameter extractions for speech synthesis, so there are no errors in source parameter extractions and we are able to make the best use of the original source characteristics. In terms of spectrum conversion, we evaluate with both GMM and DNN. Subjective evaluations show that our algorithm can synthesize more natural speech than the vocoder-based method. Judging from observations of the spectrogram, power in high-frequency bands of fricatives and stops is reconstructed to be similar to that of natural speech.
キーワード voice conversion speech intelligibility glossectomy spectral differential neural network
発行日 2018-09-02
出版物タイトル Proceedings of Interspeech 2018
出版者 International Speech Communication Association
開始ページ 2464
終了ページ 2468
ISSN 1990-9772
資料タイプ 会議発表論文
言語 English
OAI-PMH Set 岡山大学
論文のバージョン publisher
DOI 10.21437/Interspeech.2018-1239
関連URL isVersionOf https://doi.org/10.21437/Interspeech.2018-1239
フルテキストURL IEEE_ICME2016_MMCloudCity_W136.pdf
著者 原 直| Kobayashi, Shota| 阿部 匡伸|
抄録 This paper presents a sound collection system that uses crowdsourcing to gather information for visualizing area characteristics. First, we developed a sound collection system to simultaneously collect physical sounds, their statistics, and subjective evaluations. We then conducted a sound collection experiment using the developed system on 14 participants. We collected 693,582 samples of equivalent Aweighted loudness levels and their locations, and 5,935 samples of sounds and their locations. The data also include subjective evaluations by the participants. In addition, we analyzed the changes in sound properties of some areas before and after the opening of a large-scale shopping mall in a city. Next, we implemented visualizations on the server system to attract users’ interests. Finally, we published the system, which can receive sounds from any Android smartphone user. The sound data were continuously collected and achieved a specified result.
キーワード Environmental sound Crowdsourcing Loudness Crowdedness Smart City
備考 Copyright © 2016 IEEE. Reprinted from EEE ICME Workshop on Multimedia Mobile Cloud for Smart City Applications (MMCloudCity-2016). This material is posted here with permission of the IEEE. Permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to pubs-permissions@ieee.org. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.
発行日 2016-07
出版物タイトル IEEE ICME Workshop on Multimedia Mobile Cloud for Smart City Applications (MMCloudCity-2016)
資料タイプ 会議発表論文
言語 English
OAI-PMH Set 岡山大学
著作権者 Copyright © 2016 IEEE.
論文のバージョン author
オフィシャル URL http://icme2016.org/|
フルテキストURL yamaoka_APSIPA2015_rev9.pdf
著者 Yamaoka, Masaki| 原 直| 阿部 匡伸|
抄録 We propose a spoken dialog strategy for car navigation systems to facilitate safe driving. To drive safely, drivers need to concentrate on their driving; however, their concentration may be disrupted due to disagreement with their spoken dialog system. Therefore, we need to solve the problems of user misunderstandings as well as misunderstanding of spoken dialog systems. For this purpose, we introduced a driver workload level in spoken dialog management in order to prevent user misunderstandings. A key strategy of the dialog management is to make speech redundant if the driver’s workload is too high in assuming that the user probably misunderstand the system utterance under such a condition. An experiment was conducted to compare performances of the proposed method and a conventional method using a user simulator. The simulator is developed under the assumption of two types of drivers: an experienced driver model and a novice driver model. Experimental results showed that the proposed strategies achieved better performance than the conventional one for task completion time, task completion rate, and user’s positive speech rate. In particular, these performance differences are greater for novice users than for experienced users.
発行日 2015-12
出版物タイトル Proceedings of APSIPA Annual Summit and Conference 2015
開始ページ 223
終了ページ 226
資料タイプ 会議発表論文
言語 English
OAI-PMH Set 岡山大学
著作権者 ©2015 APSIPA
論文のバージョン author
オフィシャル URL http://www.apsipa2015.org/|
関連URL http://www.apsipa.org/proceedings_2015/pdf/74.pdf
フルテキストURL hara_CASPer15_final_rev2.pdf
著者 Hara, Sunao| Abe, Masanobu| Sonehara, Noboru|
抄録 This paper presents a sound collection system to visualize environmental sounds that are collected using a crowd-sourcing approach. An analysis of physical features is generally used to analyze sound properties; however, human beings not only analyze but also emotionally connect to sounds. If we want to visualize the sounds according to the characteristics of the listener, we need to collect not only the raw sound, but also the subjective feelings associated with them. For this purpose, we developed a sound collection system using a crowdsourcing approach to collect physical sounds, their statistics, and subjective evaluations simultaneously. We then conducted a sound collection experiment using the developed system on ten participants.We collected 6,257 samples of equivalent loudness levels and their locations, and 516 samples of sounds and their locations. Subjective evaluations by the participants are also included in the data. Next, we tried to visualize the sound on a map. The loudness levels are visualized as a color map and the sounds are visualized as icons which indicate the sound type. Finally, we conducted a discrimination experiment on the sound to implement a function of automatic conversion from sounds to appropriate icons. The classifier is trained on the basis of the GMM-UBM (Gaussian Mixture Model and Universal Background Model) method. Experimental results show that the F-measure is 0.52 and the AUC is 0.79.
備考 © © 2015 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. The article has been accepted for publication.<br> Published in:Pervasive Computing and Communication Workshops (PerCom Workshops), 2015 IEEE International Conference on; <br> Date of Conference:23-27 March 2015; <br> Page(s):390 - 395; <br> Conference Location : St. Louis, MO, USA; <br> Publisher:IEEE; <br> Publisher URL:http://dx.doi.org/10.1109/PERCOMW.2015.7134069 ; <br> © Copyright 2015 IEEE - All rights reserved.
発行日 2015-03
出版物タイトル Proceedings of 2015 IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom Workshops)
出版者 IEEE
開始ページ 390
終了ページ 395
資料タイプ 学術雑誌論文
言語 English
OAI-PMH Set 岡山大学
論文のバージョン author
DOI 10.1109/PERCOMW.2015.7134069