start-ver=1.4 cd-journal=joma no-vol=126 cd-vols= no-issue= article-no= start-page=35 end-page=43 dt-received= dt-revised= dt-accepted= dt-pub-year=2021 dt-pub=202102 dt-online= en-article= kn-article= en-subject= kn-subject= en-title= kn-title=Model architectures to extrapolate emotional expressions in DNN-based text-to-speech en-subtitle= kn-subtitle= en-abstract= kn-abstract=This paper proposes architectures that facilitate the extrapolation of emotional expressions in deep neural network (DNN)-based text-to-speech (TTS). In this study, the meaning of “extrapolate emotional expressions” is to borrow emotional expressions from others, and the collection of emotional speech uttered by target speakers is unnecessary. Although a DNN has potential power to construct DNN-based TTS with emotional expressions and some DNN-based TTS systems have demonstrated satisfactory performances in the expression of the diversity of human speech, it is necessary and troublesome to collect emotional speech uttered by target speakers. To solve this issue, we propose architectures to separately train the speaker feature and the emotional feature and to synthesize speech with any combined quality of speakers and emotions. The architectures are parallel model (PM), serial model (SM), auxiliary input model (AIM), and hybrid models (PM&AIM and SM&AIM). These models are trained through emotional speech uttered by few speakers and neutral speech uttered by many speakers. Objective evaluations demonstrate that the performances in the open-emotion test provide insufficient information. They make a comparison with those in the closed-emotion test, but each speaker has their own manner of expressing emotion. However, subjective evaluation results indicate that the proposed models could convey emotional information to some extent. Notably, the PM can correctly convey sad and joyful emotions at a rate of >60%. en-copyright= kn-copyright= en-aut-name=InoueKatsuki en-aut-sei=Inoue en-aut-mei=Katsuki kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=1 ORCID= en-aut-name=HaraSunao en-aut-sei=Hara en-aut-mei=Sunao kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=2 ORCID= en-aut-name=AbeMasanobu en-aut-sei=Abe en-aut-mei=Masanobu kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=3 ORCID= en-aut-name=HojoNobukatsu en-aut-sei=Hojo en-aut-mei=Nobukatsu kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=4 ORCID= en-aut-name=IjimaYusuke en-aut-sei=Ijima en-aut-mei=Yusuke kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=5 ORCID= affil-num=1 en-affil=Graduate school of Interdisciplinary Science and Engineering in Health Systems, Okayama University kn-affil= affil-num=2 en-affil=Graduate school of Interdisciplinary Science and Engineering in Health Systems, Okayama University kn-affil= affil-num=3 en-affil=Graduate school of Interdisciplinary Science and Engineering in Health Systems, Okayama University kn-affil= affil-num=4 en-affil=NTT Corporation kn-affil= affil-num=5 en-affil=NTT Corporation kn-affil= en-keyword=Emotional speech synthesis kn-keyword=Emotional speech synthesis en-keyword=Extrapolation kn-keyword=Extrapolation en-keyword=DNN-based TTS kn-keyword=DNN-based TTS en-keyword=Text-to-speech kn-keyword=Text-to-speech en-keyword=Acoustic model kn-keyword=Acoustic model en-keyword=Phoneme duration model kn-keyword=Phoneme duration model END start-ver=1.4 cd-journal=joma no-vol=132 cd-vols= no-issue=2 article-no= start-page=92 end-page=94 dt-received= dt-revised= dt-accepted= dt-pub-year=2020 dt-pub=20200803 dt-online= en-article= kn-article= en-subject= kn-subject= en-title=Cyber-physical engineering informatics research core kn-title=サイバーフィジカル情報応用研究コア(Cypher)設立について en-subtitle= kn-subtitle= en-abstract= kn-abstract= en-copyright= kn-copyright= en-aut-name=AbeMasanobu en-aut-sei=Abe en-aut-mei=Masanobu kn-aut-name=阿部匡伸 kn-aut-sei=阿部 kn-aut-mei=匡伸 aut-affil-num=1 ORCID= affil-num=1 en-affil=Graduate School of Interdisciplinary Science and Engineering in Health Systems, Okayama University kn-affil=岡山大学大学院ヘルスシステム統合科学研究科 en-keyword=AI kn-keyword=AI en-keyword=Bigdata kn-keyword=Bigdata en-keyword=IoT kn-keyword=IoT END start-ver=1.4 cd-journal=joma no-vol=2019 cd-vols= no-issue= article-no= start-page=143 end-page=147 dt-received= dt-revised= dt-accepted= dt-pub-year=2019 dt-pub=201911 dt-online= en-article= kn-article= en-subject= kn-subject= en-title= kn-title=Speech-like Emotional Sound Generator by WaveNet en-subtitle= kn-subtitle= en-abstract= kn-abstract=In this paper, we propose a new algorithm to generate Speech-like Emotional Sound (SES). Emotional information plays an important role in human communication, and speech is one of the most useful media to express emotions. Although, in general, speech conveys emotional information as well as linguistic information, we have undertaken the challenge to generate sounds that convey emotional information without linguistic information, which results in making conversations in human-machine interactions more natural in some situations by providing non-verbal emotional vocalizations. We call the generated sounds “speech-like”, because the sounds do not contain any linguistic information. For the purpose, we propose to employ WaveNet as a sound generator conditioned by only emotional IDs. The idea is quite different from WaveNet Vocoder that synthesizes speech using spectrum information as auxiliary features. The biggest advantage of the idea is to reduce the amount of emotional speech data for the training. The proposed algorithm consists of two steps. In the first step, WaveNet is trained to obtain phonetic features using a large speech database, and in the second step, WaveNet is re-trained using a small amount of emotional speech. Subjective listening evaluations showed that the SES could convey emotional information and was judged to sound like a human voice. en-copyright= kn-copyright= en-aut-name=MatsumotoKento en-aut-sei=Matsumoto en-aut-mei=Kento kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=1 ORCID= en-aut-name=HaraSunao en-aut-sei=Hara en-aut-mei=Sunao kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=2 ORCID= en-aut-name=AbeMasanobu en-aut-sei=Abe en-aut-mei=Masanobu kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=3 ORCID= affil-num=1 en-affil=Okayama University kn-affil= affil-num=2 en-affil=Okayama University kn-affil= affil-num=3 en-affil=Okayama University kn-affil= END start-ver=1.4 cd-journal=joma no-vol= cd-vols= no-issue= article-no= start-page= end-page= dt-received= dt-revised= dt-accepted= dt-pub-year=2016 dt-pub=201607 dt-online= en-article= kn-article= en-subject= kn-subject= en-title= kn-title=Sound collection systems using a crowdsourcing approach to construct sound map based on subjective evaluation en-subtitle= kn-subtitle= en-abstract= kn-abstract=This paper presents a sound collection system that uses crowdsourcing to gather information for visualizing area characteristics. First, we developed a sound collection system to simultaneously collect physical sounds, their statistics, and subjective evaluations. We then conducted a sound collection experiment using the developed system on 14 participants. We collected 693,582 samples of equivalent Aweighted loudness levels and their locations, and 5,935 samples of sounds and their locations. The data also include subjective evaluations by the participants. In addition, we analyzed the changes in sound properties of some areas before and after the opening of a large-scale shopping mall in a city. Next, we implemented visualizations on the server system to attract users’ interests. Finally, we published the system, which can receive sounds from any Android smartphone user. The sound data were continuously collected and achieved a specified result. en-copyright= kn-copyright= en-aut-name=HaraSunao en-aut-sei=Hara en-aut-mei=Sunao kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=1 ORCID= en-aut-name=KobayashiShota en-aut-sei=Kobayashi en-aut-mei=Shota kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=2 ORCID= en-aut-name=AbeMasanobu en-aut-sei=Abe en-aut-mei=Masanobu kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=3 ORCID= affil-num=1 en-affil=Graduate school of Natural Science and Technology, Okayama University kn-affil=岡山大学大学院自然科学研究科 affil-num=2 en-affil=Graduate school of Natural Science and Technology, Okayama University kn-affil=岡山大学大学院自然科学研究科 affil-num=3 en-affil=Graduate school of Natural Science and Technology, Okayama University kn-affil=岡山大学大学院自然科学研究科 en-keyword=Environmental sound kn-keyword=Environmental sound en-keyword=Crowdsourcing kn-keyword=Crowdsourcing en-keyword=Loudness kn-keyword=Loudness en-keyword=Crowdedness kn-keyword=Crowdedness en-keyword=Smart City kn-keyword=Smart City END start-ver=1.4 cd-journal=joma no-vol=70 cd-vols= no-issue=3 article-no= start-page=205 end-page=211 dt-received= dt-revised= dt-accepted= dt-pub-year=2016 dt-pub=201606 dt-online= en-article= kn-article= en-subject= kn-subject= en-title= kn-title=Structure of a New Palatal Plate and the Artificial Tongue for Articulation Disorder in a Patient with Subtotal Glossectomy en-subtitle= kn-subtitle= en-abstract= kn-abstract=A palatal augmentation prosthesis (PAP) is used to facilitate improvement in the speech and swallowing functions of patients with tongue resection or tongue movement disorders. However, a PAPʼs effect is limited in cases where articulation disorder is severe due to wide glossectomy and/or segmental mandibulectomy. In this paper, we describe speech outcomes of a patient with an articulation disorder following glossectomy and segmental mandibulectomy. We used a palatal plate (PP) based on a PAP, along with an artificial tongue (KAT). Speech improvement was evaluated by a standardized speech intelligibility test consisting of 100 syllables. The speech intelligibility score was significantly higher when the patient wore both the PP and KAT than when he wore neither (p=0.013). The conversational intelligibility score was significantly improved with the PP and KAT than without PP and KAT (p=0.024). These results suggest that speech function can be improved in patients with hard tissue defects with segmental mandibulectomy using both a PP and a KAT. The nature of the design of the PP and that of the KAT will allow these prostheses to address a wide range of tissue defects. en-copyright= kn-copyright= en-aut-name=KozakiKen-ichi en-aut-sei=Kozaki en-aut-mei=Ken-ichi kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=1 ORCID= en-aut-name=KawakamiShigehisa en-aut-sei=Kawakami en-aut-mei=Shigehisa kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=2 ORCID= en-aut-name=KonishiTakayuki en-aut-sei=Konishi en-aut-mei=Takayuki kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=3 ORCID= en-aut-name=OhtaKeiji en-aut-sei=Ohta en-aut-mei=Keiji kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=4 ORCID= en-aut-name=YanoJitsuro en-aut-sei=Yano en-aut-mei=Jitsuro kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=5 ORCID= en-aut-name=OnodaTomoo en-aut-sei=Onoda en-aut-mei=Tomoo kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=6 ORCID= en-aut-name=MatsumotoHiroshi en-aut-sei=Matsumoto en-aut-mei=Hiroshi kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=7 ORCID= en-aut-name=MizukawaNobuyoshi en-aut-sei=Mizukawa en-aut-mei=Nobuyoshi kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=8 ORCID= en-aut-name=KimataYoshihiro en-aut-sei=Kimata en-aut-mei=Yoshihiro kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=9 ORCID= en-aut-name=NishizakiKazunori en-aut-sei=Nishizaki en-aut-mei=Kazunori kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=10 ORCID= en-aut-name=IidaSeiji en-aut-sei=Iida en-aut-mei=Seiji kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=11 ORCID= en-aut-name=GofukuAkio en-aut-sei=Gofuku en-aut-mei=Akio kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=12 ORCID= en-aut-name=AbeMasanobu en-aut-sei=Abe en-aut-mei=Masanobu kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=13 ORCID= en-aut-name=MinagiShogo en-aut-sei=Minagi en-aut-mei=Shogo kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=14 ORCID= en-aut-name=Okayama Dream Speech Project en-aut-sei=Okayama Dream Speech Project en-aut-mei= kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=15 ORCID= affil-num=1 en-affil=Department of Dental Pharmacology, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences kn-affil= affil-num=2 en-affil=Department of Occlusal and Oral Functional Rehabilitation, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences kn-affil= affil-num=3 en-affil=Division of Physical Medicine and Rehabilitation, Okayama University Hospital kn-affil= affil-num=4 en-affil=Dental Laboratory Division, Okayama University Hospital kn-affil= affil-num=5 en-affil=Department of Occlusal and Oral Functional Rehabilitation, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences kn-affil= affil-num=6 en-affil=Department of Otolaryngology-Head and Neck Surgery Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences kn-affil= affil-num=7 en-affil=Department of Plastic and Reconstructive Surgery, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences kn-affil= affil-num=8 en-affil=Department of Oral and Maxillofacial Reconstructive Surgery, Okayama University Hospital kn-affil= affil-num=9 en-affil=Department of Plastic and Reconstructive Surgery, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences kn-affil= affil-num=10 en-affil=Department of Otolaryngology-Head and Neck Surgery Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences kn-affil= affil-num=11 en-affil=Department of Oral and Maxillofacial Reconstructive Surgery, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences kn-affil= affil-num=12 en-affil=Graduate School of Natural Science and Technology, Okayama University kn-affil= affil-num=13 en-affil=Department of Computer Science, Okayama University kn-affil= affil-num=14 en-affil=Department of Occlusal and Oral Functional Rehabilitation, Okayama University Graduate School of Medicine, Dentistry and Pharmaceutical Sciences kn-affil= affil-num=15 en-affil= kn-affil= en-keyword=palatal augmentation prosthesis kn-keyword=palatal augmentation prosthesis en-keyword=artificial tongue kn-keyword=artificial tongue en-keyword=articulation disorder kn-keyword=articulation disorder en-keyword=glossectomy kn-keyword=glossectomy en-keyword=mandibulectomy kn-keyword=mandibulectomy END start-ver=1.4 cd-journal=joma no-vol= cd-vols= no-issue= article-no= start-page=223 end-page=226 dt-received= dt-revised= dt-accepted= dt-pub-year=2015 dt-pub=201512 dt-online= en-article= kn-article= en-subject= kn-subject= en-title= kn-title=A Spoken Dialog System with Redundant Response to Prevent User Misunderstanding en-subtitle= kn-subtitle= en-abstract= kn-abstract=We propose a spoken dialog strategy for car navigation systems to facilitate safe driving. To drive safely, drivers need to concentrate on their driving; however, their concentration may be disrupted due to disagreement with their spoken dialog system. Therefore, we need to solve the problems of user misunderstandings as well as misunderstanding of spoken dialog systems. For this purpose, we introduced a driver workload level in spoken dialog management in order to prevent user misunderstandings. A key strategy of the dialog management is to make speech redundant if the driver’s workload is too high in assuming that the user probably misunderstand the system utterance under such a condition. An experiment was conducted to compare performances of the proposed method and a conventional method using a user simulator. The simulator is developed under the assumption of two types of drivers: an experienced driver model and a novice driver model. Experimental results showed that the proposed strategies achieved better performance than the conventional one for task completion time, task completion rate, and user’s positive speech rate. In particular, these performance differences are greater for novice users than for experienced users. en-copyright= kn-copyright= en-aut-name=YamaokaMasaki en-aut-sei=Yamaoka en-aut-mei=Masaki kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=1 ORCID= en-aut-name=HaraSunao en-aut-sei=Hara en-aut-mei=Sunao kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=2 ORCID= en-aut-name=AbeMasanobu en-aut-sei=Abe en-aut-mei=Masanobu kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=3 ORCID= affil-num=1 en-affil= kn-affil=Okayama University affil-num=2 en-affil= kn-affil=岡山大学大学院自然科学研究科 affil-num=3 en-affil= kn-affil=岡山大学大学院自然科学研究科 END start-ver=1.4 cd-journal=joma no-vol= cd-vols= no-issue= article-no= start-page=390 end-page=395 dt-received= dt-revised= dt-accepted= dt-pub-year=2015 dt-pub=201503 dt-online= en-article= kn-article= en-subject= kn-subject= en-title= kn-title=Sound collection and visualization system enabled participatory and opportunistic sensing approaches en-subtitle= kn-subtitle= en-abstract= kn-abstract=This paper presents a sound collection system to visualize environmental sounds that are collected using a crowd-sourcing approach. An analysis of physical features is generally used to analyze sound properties; however, human beings not only analyze but also emotionally connect to sounds. If we want to visualize the sounds according to the characteristics of the listener, we need to collect not only the raw sound, but also the subjective feelings associated with them. For this purpose, we developed a sound collection system using a crowdsourcing approach to collect physical sounds, their statistics, and subjective evaluations simultaneously. We then conducted a sound collection experiment using the developed system on ten participants.We collected 6,257 samples of equivalent loudness levels and their locations, and 516 samples of sounds and their locations. Subjective evaluations by the participants are also included in the data. Next, we tried to visualize the sound on a map. The loudness levels are visualized as a color map and the sounds are visualized as icons which indicate the sound type. Finally, we conducted a discrimination experiment on the sound to implement a function of automatic conversion from sounds to appropriate icons. The classifier is trained on the basis of the GMM-UBM (Gaussian Mixture Model and Universal Background Model) method. Experimental results show that the F-measure is 0.52 and the AUC is 0.79. en-copyright= kn-copyright= en-aut-name=HaraSunao en-aut-sei=Hara en-aut-mei=Sunao kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=1 ORCID= en-aut-name=AbeMasanobu en-aut-sei=Abe en-aut-mei=Masanobu kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=2 ORCID= en-aut-name=SoneharaNoboru en-aut-sei=Sonehara en-aut-mei=Noboru kn-aut-name= kn-aut-sei= kn-aut-mei= aut-affil-num=3 ORCID= affil-num=1 en-affil= kn-affil=Graduate School of Natural Science and Technology Okayama University affil-num=2 en-affil= kn-affil=Graduate School of Natural Science and Technology Okayama University affil-num=3 en-affil= kn-affil=National Institute of Informatics END