no code implementations • WMT (EMNLP) 2021 • Kosuke Takahashi, Yoichi Ishibashi, Katsuhito Sudoh, Satoshi Nakamura
This paper describes our submission to the WMT2021 shared metrics task.
no code implementations • IWSLT (EMNLP) 2018 • Kaho Osamura, Takatomo Kano, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura
In this paper, a neural sequence-to-sequence ASR is used as feature processing that is trained to produce word posterior features given spoken utterances.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • IWSLT (ACL) 2022 • Ryo Fukuda, Yuka Ko, Yasumasa Kano, Kosuke Doi, Hirotaka Tokuyama, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura
This paper describes NAIST’s simultaneous speech translation systems developed for IWSLT 2022 Evaluation Campaign.
no code implementations • IWSLT (ACL) 2022 • Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondřej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vĕra Kloudová, Surafel Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nǎdejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alexander Waibel, Changhan Wang, Shinji Watanabe
The evaluation campaign of the 19th International Conference on Spoken Language Translation featured eight shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Speech to speech translation, (iv) Low-resource speech translation, (v) Multilingual speech translation, (vi) Dialect speech translation, (vii) Formality control for speech translation, (viii) Isometric speech translation.
no code implementations • IWSLT (ACL) 2022 • Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura
Simultaneous translation is a task that requires starting translation before the speaker has finished speaking, so we face a trade-off between latency and accuracy.
no code implementations • IWSLT (EMNLP) 2018 • Johanes Effendi, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura
In this paper, we investigate and utilize neural paraphrasing to improve translation quality in neural MT (NMT), which has not yet been much explored.
no code implementations • EACL (HumEval) 2021 • Katsuhito Sudoh, Kosuke Takahashi, Satoshi Nakamura
Our classification-based approach focuses on such errors using several error type labels, for practical machine translation evaluation in an age of neural machine translation.
no code implementations • ACL (IWSLT) 2021 • Antonios Anastasopoulos, Ondřej Bojar, Jacob Bremerman, Roldano Cattoni, Maha Elbayad, Marcello Federico, Xutai Ma, Satoshi Nakamura, Matteo Negri, Jan Niehues, Juan Pino, Elizabeth Salesky, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Alexander Waibel, Changhan Wang, Matthew Wiesner
The evaluation campaign of the International Conference on Spoken Language Translation (IWSLT 2021) featured this year four shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Multilingual speech translation, (iv) Low-resource speech translation.
no code implementations • ICON 2021 • Hour Kaing, Chenchen Ding, Katsuhito Sudoh, Masao Utiyama, Eiichiro Sumita, Satoshi Nakamura
Pretrained multilingual language models have become a key part of cross-lingual transfer for many natural language processing tasks, even those without bilingual information.
no code implementations • ICON 2021 • Kohichi Takai, Gen Hattori, Akio Yoneyama, Keiji Yasuda, Katsuhito Sudoh, Satoshi Nakamura
The proposed method applies the Named Entity (NE) fea-ture vector to Factored Transformer for accurate proper noun translation.
no code implementations • dialdoc (ACL) 2022 • Yuya Nakano, Seiya Kawano, Koichiro Yoshino, Katsuhito Sudoh, Satoshi Nakamura
Ambiguous questions are generated by eliminating a part of a sentence considering the sentence structure.
no code implementations • ACL (IWSLT) 2021 • Kosuke Doi, Katsuhito Sudoh, Satoshi Nakamura
This paper describes the construction of a new large-scale English-Japanese Simultaneous Interpretation (SI) corpus and presents the results of its analysis.
no code implementations • ACL (IWSLT) 2021 • Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura
Recent studies argue that knowledge distillation is promising for speech translation (ST) using end-to-end models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • ACL (IWSLT) 2021 • Ryo Fukuda, Yui Oka, Yasumasa Kano, Yuki Yano, Yuka Ko, Hirotaka Tokuyama, Kosuke Doi, Sakriani Sakti, Katsuhito Sudoh, Satoshi Nakamura
This paper describes NAIST’s system for the English-to-Japanese Simultaneous Text-to-text Translation Task in IWSLT 2021 Evaluation Campaign.
no code implementations • 7 Feb 2024 • Roman Koshkin, Katsuhito Sudoh, Satoshi Nakamura
Decoder-only large language models (LLMs) have recently demonstrated impressive capabilities in text generation and reasoning.
no code implementations • 29 Jan 2024 • Kenta Izumi, Hiroki Tanaka, Kazuhiro Shidara, Hiroyoshi Adachi, Daisuke Kanayama, Takashi Kudo, Satoshi Nakamura
By comparing systems that use LLM-generated responses with those that do not, we investigate the impact of generated responses on subjective evaluations such as mood change, cognitive change, and dialogue quality (e. g., empathy).
no code implementations • 24 Nov 2023 • Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura
In this work, we propose a novel latency evaluation metric for simultaneous translation called \emph{Average Token Delay} (ATD) that focuses on the duration of partial translations.
no code implementations • 14 Oct 2023 • Takeshi Saga, Hiroki Tanaka, Satoshi Nakamura
We confirmed that an FTD-related subscale, odd speech, was significantly correlated with both the total SPQ and SRS scores, although they themselves were not correlated significantly.
no code implementations • 14 Jun 2023 • Yuka Ko, Ryo Fukuda, Yuta Nishikawa, Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura
In this paper, we propose an effective way to train a SimulST model using mixed data of SI and offline.
no code implementations • 26 May 2023 • Yuta Nishikawa, Satoshi Nakamura
In this study, we propose an inter-connection mechanism that aggregates the information from each layer of the speech pre-trained model by weighted sums and inputs into the decoder.
no code implementations • 19 May 2023 • Hiroki Ouchi, Hiroyuki Shindo, Shoko Wakamiya, Yuki Matsuda, Naoya Inoue, Shohei Higashiyama, Satoshi Nakamura, Taro Watanabe
We have constructed Arukikata Travelogue Dataset and released it free of charge for academic research.
1 code implementation • 25 Apr 2023 • Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura
In this study, we extended SHAS to improve ST translation accuracy and efficiency by splitting speech into shorter segments that correspond to sentences.
no code implementations • 23 Apr 2023 • Jinming Zhao, Yuka Ko, Kosuke Doi, Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura
Research has been limited due to the lack of a large-scale training corpus.
1 code implementation • 7 Mar 2023 • Kazuma Kobayashi, Lin Gu, Ryuichiro Hataya, Takaaki Mizuno, Mototaka Miyake, Hirokazu Watanabe, Masamichi Takahashi, Yasuyuki Takamizawa, Yukihiro Yoshida, Satoshi Nakamura, Nobuji Kouno, Amina Bolatkan, Yusuke Kurose, Tatsuya Harada, Ryuji Hamamoto
As a result, our SBMIR system enabled users to overcome previous challenges, including image retrieval based on fine-grained image characteristics, image retrieval without example images, and image retrieval for isolated samples.
no code implementations • 1 Mar 2023 • Yuka Okuda, Katsuhito Sudoh, Seitaro Shinagawa, Satoshi Nakamura
A conversational recommender system (CRS) is a practical application for item recommendation through natural language conversation.
1 code implementation • 15 Feb 2023 • Seyed Mahed Mousavi, Shohei Tanaka, Gabriel Roccabruna, Koichiro Yoshino, Satoshi Nakamura, Giuseppe Riccardi
We publish the annotated dataset, annotation materials, and machine learning baseline models for the task of new event extraction for narrative understanding.
1 code implementation • 11 Feb 2023 • Yoichi Ishibashi, Danushka Bollegala, Katsuhito Sudoh, Satoshi Nakamura
To address this question, we conduct a systematic study of the robustness of discrete prompts by applying carefully designed perturbations into an application using AutoPrompt and then measure their performance in two Natural Language Inference (NLI) datasets.
no code implementations • 8 Jan 2023 • Heli Qi, Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
This paper introduces SpeeChain, an open-source Pytorch-based toolkit designed to develop the machine speech chain for large-scale use.
1 code implementation • IEEE Transactions on Multimedia 2020 • Fan Yang, Yang Wu, Zheng Wang, Xiang Li, Sakriani Sakti, Satoshi Nakamura
Therefore, previous works pre-train their models on rich-labeled photo retrieval data (i. e., source domain) and then fine-tune them on the limited-labeled sketch-to-photo retrieval data (i. e., target domain).
Ranked #1 on Image Retrieval on PKU-Reid
no code implementations • 22 Nov 2022 • Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura
In this work, we propose a novel latency evaluation metric called Average Token Delay (ATD) that focuses on the end timings of partial translations in simultaneous translation.
1 code implementation • 1 Nov 2022 • Keisuke Toyama, Katsuhito Sudoh, Satoshi Nakamura
Although the well-known MR-to-text E2E dataset has been used by many researchers, its MR-text pairs include many deletion/insertion/substitution errors.
1 code implementation • 24 Oct 2022 • Yoichi Ishibashi, Sho Yokoi, Katsuhito Sudoh, Satoshi Nakamura
In the field of natural language processing (NLP), continuous vector representations are crucial for capturing the semantic meanings of individual words.
1 code implementation • 27 Aug 2022 • Fan Yang, Norimichi Ukita, Sakriani Sakti, Satoshi Nakamura
By using MOT, the spatiotemporal boundary of each actor is obtained and assigned to a unique actor identity.
4 code implementations • 12 Aug 2022 • Yidong Wang, Hao Chen, Yue Fan, Wang Sun, Ran Tao, Wenxin Hou, RenJie Wang, Linyi Yang, Zhi Zhou, Lan-Zhe Guo, Heli Qi, Zhen Wu, Yu-Feng Li, Satoshi Nakamura, Wei Ye, Marios Savvides, Bhiksha Raj, Takahiro Shinozaki, Bernt Schiele, Jindong Wang, Xing Xie, Yue Zhang
We further provide the pre-trained versions of the state-of-the-art neural models for CV tasks to make the cost affordable for further tuning.
no code implementations • 1 Jun 2022 • Holy Lovenia, Hiroki Tanaka, Sakriani Sakti, Ayu Purwarianti, Satoshi Nakamura
Research about brain activities involving spoken word production is considerably underdeveloped because of the undiscovered characteristics of speech artifacts, which contaminate electroencephalogram (EEG) signals and prevent the inspection of the underlying cognitive processes.
no code implementations • 14 May 2022 • Heli Qi, Sashi Novitasari, Sakriani Sakti, Satoshi Nakamura
The existing paradigm of semi-supervised S2S ASR utilizes SpecAugment as data augmentation and requires a static teacher model to produce pseudo transcripts for untranscribed speech.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 29 Mar 2022 • Naoaki Suzuki, Satoshi Nakamura
As a type of paralinguistic information, English speech uses sentence stress, the heaviest prominence within a sentence, to convey emphasis.
no code implementations • 29 Mar 2022 • Kei Furukawa, Takeshi Kishiyama, Satoshi Nakamura
End-to-end text-to-speech synthesis (TTS), which generates speech sounds directly from strings of texts or phonemes, has improved the quality of speech synthesis over the conventional TTS.
1 code implementation • 29 Mar 2022 • Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura
We also propose a hybrid method that combines VAD and the above speech segmentation method.
no code implementations • WMT (EMNLP) 2021 • Yasumasa Kano, Katsuhito Sudoh, Satoshi Nakamura
Simultaneous translation is a task in which translation begins before the speaker has finished speaking, so it is important to decide when to start the translation process.
no code implementations • 29 Jul 2021 • Yui Oka, Katsuhito Sudoh, Satoshi Nakamura
Non-autoregressive neural machine translation (NAT) usually employs sequence-level knowledge distillation using autoregressive neural machine translation (AT) as its teacher model.
1 code implementation • SIGDIAL (ACL) 2021 • Shohei Tanaka, Koichiro Yoshino, Katsuhito Sudoh, Satoshi Nakamura
In order to train the classification model on such training data, we applied the positive/unlabeled (PU) learning method, which assumes that only a part of the data is labeled with positive examples.
no code implementations • COLING 2020 • Yui Oka, Katsuki Chousa, Katsuhito Sudoh, Satoshi Nakamura
Since length constraints with exact target sentence lengths degrade translation performance, we add random noise within a certain window size to the length constraints in the PE during the training.
no code implementations • COLING 2020 • Koichiro Yoshino, Kana Ikeuchi, Katsuhito Sudoh, Satoshi Nakamura
Spoken language understanding (SLU), which converts user requests in natural language to machine-interpretable expressions, is becoming an essential task.
no code implementations • 10 Nov 2020 • Katsuhito Sudoh, Takatomo Kano, Sashi Novitasari, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura
This paper presents a newly developed, simultaneous neural speech-to-speech translation system and its evaluation.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
no code implementations • 4 Nov 2020 • Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Previous research has proposed a machine speech chain to enable automatic speech recognition (ASR) and text-to-speech synthesis (TTS) to assist each other in semi-supervised learning and to avoid the need for a large amount of paired speech and text data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
no code implementations • 4 Nov 2020 • Sashi Novitasari, Andros Tjandra, Tomoya Yanagita, Sakriani Sakti, Satoshi Nakamura
By contrast, humans can listen to what hey speak in real-time, and if there is a delay in hearing, they won't be able to continue speaking.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 4 Nov 2020 • Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
One main reason is because the model needs to decide the incremental steps and learn the transcription that aligns with the current short speech segment.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • LREC 2020 • Sashi Novitasari, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
We then develop ASR and TTS of ethnic languages by utilizing Indonesian ASR and TTS in a cross-lingual machine speech chain framework with only text or only speech data removing the need for paired speech-text data of those ethnic languages.
no code implementations • 19 Oct 2020 • Dušan Variš, Katsuhito Sudoh, Satoshi Nakamura
We present our work in progress exploring the possibilities of a shared embedding space between textual and visual modality.
no code implementations • 7 Jul 2020 • Fan Yang, Xin Chang, Chenyu Dang, Ziqiang Zheng, Sakriani Sakti, Satoshi Nakamura, Yang Wu
We aim to improve the performance of Multiple Object Tracking and Segmentation (MOTS) by refinement.
Ranked #1 on Multi-Object Tracking on MOTS20
Multi-Object Tracking Multi-Object Tracking and Segmentation +2
2 code implementations • ACL 2020 • Yoichi Ishibashi, Katsuhito Sudoh, Koichiro Yoshino, Satoshi Nakamura
For transferring king into queen in this analogy-based manner, we subtract a difference vector man - woman based on the knowledge that king is male.
no code implementations • ACL 2020 • Kosuke Takahashi, Katsuhito Sudoh, Satoshi Nakamura
Our experiments show that our proposed method using Cross-lingual Language Model (XLM) trained with a translation language modeling (TLM) objective achieves a higher correlation with human judgments than a baseline method that uses only hypothesis and reference sentences.
no code implementations • WS 2020 • Ryo Fukuda, Katsuhito Sudoh, Satoshi Nakamura
This paper describes NAIST{'}s NMT system submitted to the IWSLT 2020 conversational speech translation task.
no code implementations • 24 May 2020 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In this paper, we report our submitted system for the ZeroSpeech 2020 challenge on Track 2019.
no code implementations • LREC 2020 • Sara Asai, Koichiro Yoshino, Seitaro Shinagawa, Sakriani Sakti, Satoshi Nakamura
Expressing emotion is known as an efficient way to persuade one{'}s dialogue partner to accept one{'}s claim or proposal.
no code implementations • 23 Mar 2020 • Koichiro Yoshino, Kohei Wakimoto, Yuta Nishimura, Satoshi Nakamura
Two reasons make it challenging to apply existing sequence-to-sequence models to this mapping: 1) it is hard to prepare a large-scale dataset for any kind of robots and their environment, and 2) there is a gap between the number of samples obtained from robot action observations and generated word sequences of captions.
no code implementations • 27 Nov 2019 • Katsuki Chousa, Katsuhito Sudoh, Satoshi Nakamura
Simultaneous machine translation is a variant of machine translation that starts the translation process before the end of an input.
1 code implementation • 24 Nov 2019 • Fan Yang, Feiran Li, Yang Wu, Sakriani Sakti, Satoshi Nakamura
3D panoramic multi-person localization and tracking are prominent in many applications, however, conventional methods using LiDAR equipment could be economically expensive and also computationally inefficient due to the processing of point cloud data.
Ranked #1 on Multi-Object Tracking on MOT15_3D (using extra training data)
1 code implementation • 23 Oct 2019 • Andros Tjandra, Chunxi Liu, Frank Zhang, Xiaohui Zhang, Yongqiang Wang, Gabriel Synnaeve, Satoshi Nakamura, Geoffrey Zweig
As our motivation is to allow acoustic models to re-examine their input features in light of partial hypotheses we introduce intermediate model heads and loss function.
no code implementations • 2 Oct 2019 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Second, we train a sequence-to-sequence model that directly maps the source language speech to the target language's discrete representation.
no code implementations • WS 2019 • Seiya Kawano, Koichiro Yoshino, Satoshi Nakamura
We introduce an adversarial learning framework for the task of generating conditional responses with a new objective to a discriminator, which explicitly distinguishes sentences by using labels.
3 code implementations • arXiv 2019 • Fan Yang, Sakriani Sakti, Yang Wu, Satoshi Nakamura
Although skeleton-based action recognition has achieved great success in recent years, most of the existing methods may suffer from a large model size and slow execution speed.
Ranked #1 on Hand Gesture Recognition on DHG-14
2 code implementations • WS 2019 • Shohei Tanaka, Koichiro Yoshino, Katsuhito Sudoh, Satoshi Nakamura
We propose a novel method for selecting coherent and diverse responses for a given dialogue context.
no code implementations • 3 Jun 2019 • Johanes Effendi, Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Previously, a machine speech chain, which is based on sequence-to-sequence deep learning, was proposed to mimic speech perception and production behavior.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
2 code implementations • 28 May 2019 • Andrei C. Coman, Koichiro Yoshino, Yukitoshi Murase, Satoshi Nakamura, Giuseppe Riccardi
To identify the point of maximal understanding in an ongoing utterance, we a) implement an incremental Dialog State Tracker which is updated on a token basis (iDST) b) re-label the Dialog State Tracking Challenge 2 (DSTC2) dataset and c) adapt it to the incremental turn-taking experimental scenario.
no code implementations • 27 May 2019 • Andros Tjandra, Berrak Sisman, Mingyang Zhang, Sakriani Sakti, Haizhou Li, Satoshi Nakamura
Our proposed approach significantly improved the intelligibility (in CER), the MOS, and discrimination ABX scores compared to the official ZeroSpeech 2019 baseline or even the topline.
no code implementations • 26 Nov 2018 • Hisao Katsumi, Takuya Hiraoka, Koichiro Yoshino, Kazeto Yamamoto, Shota Motoura, Kunihiko Sadamasa, Satoshi Nakamura
It is required that these systems have sufficient supporting information to argue their claims rationally; however, the systems often do not have enough of such information in realistic situations.
1 code implementation • 20 Nov 2018 • Ryo Nakamura, Katsuhito Sudoh, Koichiro Yoshino, Satoshi Nakamura
Although generation-based dialogue systems have been widely researched, the response generations by most existing systems have very low diversities.
no code implementations • 31 Oct 2018 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In our previous work, we applied a speech chain mechanism as a semi-supervised learning.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • IWSLT (EMNLP) 2018 • Yuta Nishimura, Katsuhito Sudoh, Graham Neubig, Satoshi Nakamura
By using information from these multiple sources, these systems achieve large gains in accuracy.
no code implementations • 30 Jul 2018 • Katsuki Chousa, Katsuhito Sudoh, Satoshi Nakamura
The proposed loss function encourages an NMT decoder to generate words close to their references in the embedding space; this helps the decoder to choose similar acceptable words when the actual best candidates are not included in the vocabulary due to its size limitation.
no code implementations • 22 Jul 2018 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In this paper, we propose two ideas to improve sequence-to-sequence model performance by enhancing the attention module.
no code implementations • WS 2018 • Nurul Lubis, Sakriani Sakti, Koichiro Yoshino, Satoshi Nakamura
Positive emotion elicitation seeks to improve user{'}s emotional state through dialogue system interaction, where a chat-based scenario is layered with an implicit goal to address user{'}s emotional needs.
no code implementations • WS 2018 • Yuta Nishimura, Katsuhito Sudoh, Graham Neubig, Satoshi Nakamura
This study focuses on the use of incomplete multilingual corpora in multi-encoder NMT and mixture of NMT experts and examines a very simple implementation where missing source translations are replaced by a special symbol <NULL>.
no code implementations • NAACL 2018 • Jingyi Zhang, Masao Utiyama, Eiichro Sumita, Graham Neubig, Satoshi Nakamura
Specifically, for an input sentence, we use a search engine to retrieve sentence pairs whose source sides are similar with the input sentence, and then collect $n$-grams that are both in the retrieved target sentences and aligned with words that match in the source sentences, which we call "translation pieces".
no code implementations • 28 Mar 2018 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In the speech chain loop mechanism, ASR also benefits from the ability to further learn an arbitrary speaker's characteristics from the generated speech waveform, resulting in a significant improvement in the recognition rate.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • 28 Feb 2018 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In the machine learning fields, Recurrent Neural Network (RNN) has become a popular architecture for sequential data modeling.
no code implementations • 23 Feb 2018 • Seitaro Shinagawa, Koichiro Yoshino, Sakriani Sakti, Yu Suzuki, Satoshi Nakamura
We propose an interactive image-manipulation system with natural language instruction, which can generate a target image from a source image and an instruction that describes the difference between the source and the target image.
no code implementations • 13 Feb 2018 • Takatomo Kano, Sakriani Sakti, Satoshi Nakamura
Sequence-to-sequence attentional-based neural network architectures have been shown to provide a powerful model for machine translation and speech recognition.
no code implementations • IJCNLP 2017 • Louisa Pragst, Koichiro Yoshino, Wolfgang Minker, Satoshi Nakamura, Stefan Ultes
Defining all possible system actions in a dialogue system by hand is a tedious work.
Cultural Vocal Bursts Intensity Prediction Spoken Dialogue Systems
no code implementations • IJCNLP 2017 • Jingyi Zhang, Masao Utiyama, Eiichro Sumita, Graham Neubig, Satoshi Nakamura
Compared to traditional statistical machine translation (SMT), neural machine translation (NMT) often sacrifices adequacy for the sake of fluency.
no code implementations • WS 2017 • Yusuke Oda, Katsuhito Sudoh, Satoshi Nakamura, Masao Utiyama, Eiichiro Sumita
This paper describes the details about the NAIST-NICT machine translation system for WAT2017 English-Japanese Scientific Paper Translation Task.
no code implementations • 30 Oct 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Despite the success of sequence-to-sequence approaches in automatic speech recognition (ASR) systems, the models still suffer from several problems, mainly due to the mismatch between the training and inference conditions.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 22 Sep 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In this paper, we construct the first end-to-end attention-based encoder-decoder model to process directly from raw speech waveform to the text transcription.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 15 Sep 2017 • Matthias Sperber, Graham Neubig, Jan Niehues, Satoshi Nakamura, Alex Waibel
We investigate the problem of manually correcting errors from an automatic speech transcript in a cost-sensitive fashion.
no code implementations • WS 2017 • Koichiro Yoshino, Yu Suzuki, Satoshi Nakamura
We demonstrate an information navigation system for sightseeing domains that has a dialogue interface for discovering user interests for tourist activities.
no code implementations • 16 Jul 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In this paper, we take a step further and develop a closed-loop speech chain model based on deep learning.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • WS 2017 • Makoto Morishita, Yusuke Oda, Graham Neubig, Koichiro Yoshino, Katsuhito Sudoh, Satoshi Nakamura
Training of neural machine translation (NMT) models usually uses mini-batches for efficiency purposes.
no code implementations • 7 Jun 2017 • Andros Tjandra, Sakriani Sakti, Ruli Manurung, Mirna Adriani, Satoshi Nakamura
Our proposed RNNs, which are called a Long-Short Term Memory Recurrent Neural Tensor Network (LSTMRNTN) and Gated Recurrent Unit Recurrent Neural Tensor Network (GRURNTN), are made by combining the LSTM and GRU RNN models with the tensor product.
no code implementations • 31 May 2017 • Koichiro Yoshino, Shinsuke Mori, Satoshi Nakamura
This paper investigates and analyzes the effect of dependency information on predicate-argument structure analysis (PASA) and zero anaphora resolution (ZAR) for Japanese, and shows that a straightforward approach of PASA and ZAR works effectively even if dependency information was not available.
no code implementations • 23 May 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
Recurrent Neural Network (RNN) are a popular choice for modeling temporal and sequential tasks and achieve many state-of-the-art performance on various complex problems.
no code implementations • IJCNLP 2017 • Andros Tjandra, Sakriani Sakti, Satoshi Nakamura
In this paper, we propose a novel attention mechanism that has local and monotonic properties.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • ACL 2017 • Yusuke Oda, Philip Arthur, Graham Neubig, Koichiro Yoshino, Satoshi Nakamura
In this paper, we propose a new method for calculating the output layer in neural machine translation systems.
2 code implementations • EMNLP 2016 • Philip Arthur, Graham Neubig, Satoshi Nakamura
Neural machine translation (NMT) often makes mistakes in translating low-frequency content words that are essential to understanding the meaning of the sentence.
no code implementations • LREC 2016 • Nurul Lubis, R Gomez, y, Sakriani Sakti, Keisuke Nakamura, Koichiro Yoshino, Satoshi Nakamura, Kazuhiro Nakadai
Emotional aspects play a vital role in making human communication a rich and dynamic experience.
no code implementations • LREC 2016 • Matthias Sperber, Graham Neubig, Satoshi Nakamura, Alex Waibel
Our goal is to improve the human transcription quality via appropriate user interface design.
no code implementations • WS 2015 • Graham Neubig, Makoto Morishita, Satoshi Nakamura
We further perform a detailed analysis of reasons for this increase, finding that the main contributions of the neural models lie in improvement of the grammatical correctness of the output, as opposed to improvements in lexical choice of content words.
no code implementations • TACL 2015 • Philip Arthur, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura
We propose a new method for semantic parsing of ambiguous and ungrammatical input, such as search queries.
no code implementations • LREC 2014 • Hiroaki Shimizu, Graham Neubig, Sakriani Sakti, Tomoki Toda, Satoshi Nakamura
This makes it possible to compare translation data with simultaneous interpretation data.
no code implementations • LREC 2014 • Sakriani Sakti, Keigo Kubo, Sho Matsumiya, Graham Neubig, Tomoki Toda, Satoshi Nakamura, Fumihiro Adachi, Ryosuke Isotani
This paper outlines the recent development on multilingual medical data and multilingual speech recognition system for network-based speech-to-speech translation in the medical domain.
no code implementations • TACL 2014 • Matthias Sperber, Mirjam Simantzik, Graham Neubig, Satoshi Nakamura, Alex Waibel
In this paper, we study the problem of manually correcting automatic annotations of natural language in as efficient a manner as possible.