Search Results for author: Yong Man Ro

Found 73 papers, 26 papers with code

SACA Net: Cybersickness Assessment of Individual Viewers for VR Content via Graph-based Symptom Relation Embedding

no code implementations • ECCV 2020 • Sangmin Lee, Jung Uk Kim, Hak Gu Kim, Seongyeop Kim, Yong Man Ro

In this paper, we propose a novel symptom-aware cybersickness assessment network (SACA Net) that quantifies physical symptom levels for assessing cybersickness of individual viewers.

Relation

Paper
Add Code

Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank

no code implementations • 30 Apr 2024 • Sungjune Park, Hyunjun Kim, Yong Man Ro

Therefore, in this paper, we propose a novel approach to construct versatile pedestrian knowledge bank containing representative pedestrian knowledge which can be applicable to various detection frameworks and adopted in diverse scenes.

Pedestrian Detection

Paper
Add Code

MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection

no code implementations • 22 Mar 2024 • Taeheon Kim, Sangyun Chung, Damin Yeom, Youngjoon Yu, Hak Gu Kim, Yong Man Ro

Specifically, we generate text descriptions of the pedestrian in each RGB and thermal modality and design a Multispectral Chain-of-Thought (MSCoT) prompting, which models a step-by-step process to facilitate cross-modal reasoning at the semantic level and perform accurate detection.

Pedestrian Detection

Paper
Add Code

What if...?: Counterfactual Inception to Mitigate Hallucination Effects in Large Multimodal Models

1 code implementation • 20 Mar 2024 • Junho Kim, Yeon Ju Kim, Yong Man Ro

This paper presents a way of enhancing the reliability of Large Multimodal Models (LMMs) in addressing hallucination effects, where models generate incorrect or unrelated responses.

counterfactual Hallucination

Paper
Code

MoAI: Mixture of All Intelligence for Large Language and Vision Models

1 code implementation • 12 Mar 2024 • Byung-Kwan Lee, Beomchan Park, Chae Won Kim, Yong Man Ro

Therefore, we present a new LLVM, Mixture of All Intelligence (MoAI), which leverages auxiliary visual information obtained from the outputs of external segmentation, detection, SGG, and OCR models.

Ranked #27 on Visual Question Answering on MM-Vet

Scene Understanding Visual Question Answering

252

Paper
Code

Persona Extraction Through Semantic Similarity for Emotional Support Conversation Generation

no code implementations • 7 Mar 2024 • Seunghee Han, Se Jin Park, Chae Won Kim, Yong Man Ro

We devise completeness loss and consistency loss based on semantic similarity scores.

Paper
Add Code

Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection

1 code implementation • 2 Mar 2024 • Taeheon Kim, Sebin Shin, Youngjoon Yu, Hak Gu Kim, Yong Man Ro

As a result, multispectral pedestrian detectors show poor generalization ability on examples beyond this statistical correlation, such as ROTX data.

Pedestrian Detection

Paper
Code

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

1 code implementation • 25 Feb 2024 • Minsu Kim, Jee-weon Jung, Hyeongseop Rha, Soumi Maiti, Siddhant Arora, Xuankai Chang, Shinji Watanabe, Yong Man Ro

We propose a novel Tri-Modal Translation (TMT) model that translates between arbitrary modalities spanning speech, image, and text.

Decoder Machine Translation +1

Paper
Code

Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

1 code implementation • 23 Feb 2024 • Jeong Hun Yeo, Seunghee Han, Minsu Kim, Yong Man Ro

In visual speech processing, context modeling capability is one of the most important requirements due to the ambiguous nature of lip movements.

Ranked #4 on Lipreading on LRS3-TED (using extra training data)

Lipreading Lip Reading +3

273

Paper
Code

CoLLaVO: Crayon Large Language and Vision mOdel

1 code implementation • 17 Feb 2024 • Byung-Kwan Lee, Beomchan Park, Chae Won Kim, Yong Man Ro

Our findings reveal that the image understanding capabilities of current VLMs are strongly correlated with their zero-shot performance on vision language (VL) tasks.

Ranked #35 on Visual Question Answering on MM-Vet

Large Language Model Object +3

Paper
Code

Multilingual Visual Speech Recognition with a Single Model by Learning with Discrete Visual Speech Units

no code implementations • 18 Jan 2024 • Minsu Kim, Jeong Hun Yeo, Jeongsoo Choi, Se Jin Park, Yong Man Ro

By using the visual speech units as the inputs of our system, we pre-train the model to predict corresponding text outputs on massive multilingual data constructed by merging several VSR databases.

Sentence speech-recognition +1

Paper
Add Code

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation

1 code implementation • 5 Dec 2023 • Jeongsoo Choi, Se Jin Park, Minsu Kim, Yong Man Ro

To mitigate the problem of the absence of a parallel AV2AV translation dataset, we propose to train our spoken language translation system with the audio-only dataset of A2A.

Self-Supervised Learning Speech-to-Speech Translation +1

Paper
Code

Integrating Language-Derived Appearance Elements with Visual Cues in Pedestrian Detection

1 code implementation • 2 Nov 2023 • Sungjune Park, Hyunjun Kim, Yong Man Ro

The obtained knowledge elements are adaptable to various detection frameworks, so that we can provide plentiful appearance information by integrating the language-derived appearance elements with visual cues within a detector.

Pedestrian Detection

Paper
Code

Causal Unsupervised Semantic Segmentation

1 code implementation • 11 Oct 2023 • Junho Kim, Byung-Kwan Lee, Yong Man Ro

Unsupervised semantic segmentation aims to achieve high-quality semantic grouping without human-labeled annotations.

Ranked #1 on Unsupervised Semantic Segmentation on COCO-Stuff-81

Causal Inference Segmentation +2

Paper
Code

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens

no code implementations • 15 Sep 2023 • Minsu Kim, Jeongsoo Choi, Soumi Maiti, Jeong Hun Yeo, Shinji Watanabe, Yong Man Ro

To this end, we start with importing the rich knowledge related to image comprehension and language modeling from a large-scale pre-trained vision-language model into Im2Sp.

Image Comprehension Language Modelling +1

Paper
Add Code

Visual Speech Recognition for Languages with Limited Labeled Data using Automatic Labels from Whisper

no code implementations • 15 Sep 2023 • Jeong Hun Yeo, Minsu Kim, Shinji Watanabe, Yong Man Ro

Different from previous methods that tried to improve the VSR performance for the target language by using knowledge learned from other languages, we explore whether we can increase the amount of training data itself for the different languages without human intervention.

Language Identification speech-recognition +1

Paper
Add Code

DF-3DFace: One-to-Many Speech Synchronized 3D Face Animation with Diffusion

no code implementations • 23 Aug 2023 • Se Jin Park, Joanna Hong, Minsu Kim, Yong Man Ro

We contribute a new large-scale 3D facial mesh dataset, 3D-HDTF to enable the synthesis of variations in identities, poses, and facial motions of 3D face mesh.

3D Face Animation

Paper
Add Code

Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge

no code implementations • ICCV 2023 • Minsu Kim, Jeong Hun Yeo, Jeongsoo Choi, Yong Man Ro

In order to mitigate the challenge, we try to learn general speech knowledge, the ability to model lip movements, from a high-resource language through the prediction of speech units.

Lip Reading

Paper
Add Code

AKVSR: Audio Knowledge Empowered Visual Speech Recognition by Compressing Audio Knowledge of a Pretrained Model

no code implementations • 15 Aug 2023 • Jeong Hun Yeo, Minsu Kim, Jeongsoo Choi, Dae Hoe Kim, Yong Man Ro

Visual Speech Recognition (VSR) is the task of predicting spoken words from silent lip movements.

Quantization speech-recognition +1

Paper
Add Code

DiffV2S: Diffusion-based Video-to-Speech Synthesis with Vision-guided Speaker Embedding

1 code implementation • ICCV 2023 • Jeongsoo Choi, Joanna Hong, Yong Man Ro

In doing so, the rich speaker embedding information can be produced solely from input visual information, and the extra audio information is not necessary during the inference time.

Speech Synthesis

Paper
Code

Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation

1 code implementation • 3 Aug 2023 • Minsu Kim, Jeongsoo Choi, Dahun Kim, Yong Man Ro

A single pre-trained model with UTUT can be employed for diverse multilingual speech- and text-related tasks, such as Speech-to-Speech Translation (STS), multilingual Text-to-Speech Synthesis (TTS), and Text-to-Speech Translation (TTST).

Decoder Representation Learning +5

Paper
Code

Mitigating Adversarial Vulnerability through Causal Parameter Estimation by Adversarial Double Machine Learning

1 code implementation • ICCV 2023 • Byung-Kwan Lee, Junho Kim, Yong Man Ro

Adversarial examples derived from deliberately crafted perturbations on visual inputs can easily harm decision process of deep neural networks.

Adversarial Robustness

Paper
Code

Text-driven Talking Face Synthesis by Reprogramming Audio-driven Models

no code implementations • 28 Jun 2023 • Jeongsoo Choi, Minsu Kim, Se Jin Park, Yong Man Ro

The visual speaker embedding is derived from a single target face image and enables improved mapping of input text to the learned audio latent space by incorporating the speaker characteristics inherent in the audio.

Face Generation

Paper
Add Code

Robust Proxy: Improving Adversarial Robustness by Robust Proxy Learning

no code implementations • 27 Jun 2023 • Hong Joo Lee, Yong Man Ro

With the class-wise robust features, the model explicitly learns adversarially robust features through the proposed robust proxy learning framework.

Adversarial Robustness

Paper
Add Code

Advancing Adversarial Training by Injecting Booster Signal

no code implementations • 27 Jun 2023 • Hong Joo Lee, Youngjoon Yu, Yong Man Ro

Different from the previous approaches, in this paper, we propose a new approach to improve the adversarial robustness by using an external signal rather than model parameters.

Adversarial Robustness

Paper
Add Code

Intelligible Lip-to-Speech Synthesis with Speech Units

1 code implementation • 31 May 2023 • Jeongsoo Choi, Minsu Kim, Yong Man Ro

Therefore, the proposed L2S model is trained to generate multiple targets, mel-spectrogram and speech units.

Lip to Speech Synthesis Speech Synthesis

Paper
Code

Exploring Phonetic Context-Aware Lip-Sync For Talking Face Generation

no code implementations • 31 May 2023 • Se Jin Park, Minsu Kim, Jeongsoo Choi, Yong Man Ro

The contextualized lip motion unit then guides the latter in synthesizing a target identity with context-aware lip motion.

Talking Face Generation

Paper
Add Code

Multi-Temporal Lip-Audio Memory for Visual Speech Recognition

no code implementations • 8 May 2023 • Jeong Hun Yeo, Minsu Kim, Yong Man Ro

Visual Speech Recognition (VSR) is a task to predict a sentence or word from lip movements.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring

1 code implementation • CVPR 2023 • Joanna Hong, Minsu Kim, Jeongsoo Choi, Yong Man Ro

Thus, we firstly analyze that the previous AVSR models are not indeed robust to the corruption of multimodal input streams, the audio and the visual inputs, compared to uni-modal models.

Audio-Visual Speech Recognition speech-recognition +1

Paper
Code

Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression

1 code implementation • CVPR 2023 • Junho Kim, Byung-Kwan Lee, Yong Man Ro

The origin of adversarial examples is still inexplicable in research fields, and it arouses arguments from various viewpoints, albeit comprehensive investigations.

Adversarial Robustness

Paper
Code

Deep Visual Forced Alignment: Learning to Align Transcription with Talking Face Video

no code implementations • 27 Feb 2023 • Minsu Kim, Chae Won Kim, Yong Man Ro

The proposed DVFA can align the input transcription (i. e., sentence) with the talking face video without accessing the speech audio.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Lip-to-Speech Synthesis in the Wild with Multi-task Learning

3 code implementations • 17 Feb 2023 • Minsu Kim, Joanna Hong, Yong Man Ro

To this end, we design multi-task learning that guides the model using multimodal supervision, i. e., text and audio, to complement the insufficient word representations of acoustic feature reconstruction loss.

Lip to Speech Synthesis Multi-Task Learning +1

Paper
Code

Prompt Tuning of Deep Neural Networks for Speaker-adaptive Visual Speech Recognition

no code implementations • 16 Feb 2023 • Minsu Kim, Hyung-Il Kim, Yong Man Ro

As it focuses on visual information to model the speech, its performance is inherently sensitive to personal lip appearances and movements, and this makes the VSR models show degraded performance when they are applied to unseen speakers.

Sentence speech-recognition +1

Paper
Add Code

SyncTalkFace: Talking Face Generation with Precise Lip-Syncing via Audio-Lip Memory

no code implementations • 2 Nov 2022 • Se Jin Park, Minsu Kim, Joanna Hong, Jeongsoo Choi, Yong Man Ro

It stores lip motion features from sequential ground truth images in the value memory and aligns them with corresponding audio features so that they can be retrieved using audio input at inference time.

Audio-Visual Synchronization Representation Learning +1

Paper
Add Code

Meta Input: How to Leverage Off-the-Shelf Deep Neural Networks

no code implementations • 21 Oct 2022 • Minsu Kim, Youngjoon Yu, Sungjune Park, Yong Man Ro

The proposed meta input can be optimized with a small number of testing data only by considering the relation between testing input data and its output prediction.

Paper
Add Code

Face Shape-Guided Deep Feature Alignment for Face Recognition Robust to Face Misalignment

no code implementations • 15 Sep 2022 • Hyung-Il Kim, Kimin Yun, Yong Man Ro

This is mainly attributed to the mismatch between training and testing sets.

Face Alignment Face Recognition

Paper
Add Code

Speaker-adaptive Lip Reading with User-dependent Padding

1 code implementation • 9 Aug 2022 • Minsu Kim, Hyunjun Kim, Yong Man Ro

In this paper, to remedy the performance degradation of lip reading model on unseen speakers, we propose a speaker-adaptive lip reading method, namely user-dependent padding.

Lip Reading speech-recognition +1

Paper
Code

Visual Context-driven Audio Feature Enhancement for Robust End-to-End Audio-Visual Speech Recognition

1 code implementation • 13 Jul 2022 • Joanna Hong, Minsu Kim, Daehun Yoo, Yong Man Ro

The enhanced audio features are fused with the visual features and taken to an encoder-decoder model composed of Conformer and Transformer for speech recognition.

Audio-Visual Speech Recognition Decoder +3

Paper
Code

VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection

no code implementations • 15 Jun 2022 • Joanna Hong, Minsu Kim, Yong Man Ro

Thus, the proposed framework brings the advantage of synthesizing the speech containing the right content even with the silent talking face video of an unseen subject.

feature selection Speech Synthesis

Paper
Add Code

Defending Person Detection Against Adversarial Patch Attack by using Universal Defensive Frame

no code implementations • 27 Apr 2022 • Youngjoon Yu, Hong Joo Lee, Hakmin Lee, Yong Man Ro

Person detection has attracted great attention in the computer vision area and is an imperative element in human-centric computer vision.

Autonomous Driving Human Detection +2

Paper
Add Code

Distilling Robust and Non-Robust Features in Adversarial Examples by Information Bottleneck

1 code implementation • NeurIPS 2021 • Junho Kim, Byung-Kwan Lee, Yong Man Ro

Adversarial examples, generated by carefully crafted perturbation, have attracted considerable attention in research fields.

Adversarial Robustness

Paper
Code

Masking Adversarial Damage: Finding Adversarial Saliency for Robust and Sparse Network

1 code implementation • CVPR 2022 • Byung-Kwan Lee, Junho Kim, Yong Man Ro

Adversarial examples provoke weak reliability and potential security issues in deep neural networks.

Adversarial Robustness Model Compression

Paper
Code

Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video

1 code implementation • ICCV 2021 • Minsu Kim, Joanna Hong, Se Jin Park, Yong Man Ro

By learning the interrelationship through the associative bridge, the proposed bridging framework is able to obtain the target modal representations inside the memory network, even with the source modal input only, and it provides rich information for its downstream tasks.

Ranked #3 on Lipreading on CAS-VSR-W1k (LRW-1000)

Lip Reading

Paper
Code

Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading

1 code implementation • The AAAI Conference on Artificial Intelligence (AAAI) 2022 • Minsu Kim, Jeong Hun Yeo, Yong Man Ro

With the multi-head key memories, MVM extracts possible candidate audio features from the memory, which allows the lip reading model to consider the possibility of which pronunciations can be represented from the input lip movement.

Ranked #2 on Lipreading on CAS-VSR-W1k (LRW-1000)

Lip Reading

Paper
Code

Lip to Speech Synthesis with Visual Context Attentional GAN

1 code implementation • NeurIPS 2021 • Minsu Kim, Joanna Hong, Yong Man Ro

In this paper, we propose a novel lip-to-speech generative adversarial network, Visual Context Attentional GAN (VCA-GAN), which can jointly model local and global lip movements during speech synthesis.

Contrastive Learning Generative Adversarial Network +2

Paper
Code

Weakly Paired Associative Learning for Sound and Image Representations via Bimodal Associative Memory

no code implementations • CVPR 2022 • Sangmin Lee, Hyung-Il Kim, Yong Man Ro

Existing sound and image representation learning methods necessarily require a large number of sound and image with corresponding pairs.

Representation Learning

Paper
Add Code

Speech Reconstruction with Reminiscent Sound via Visual Voice Memory

1 code implementation • IEEE/ACM Transactions on Audio, Speech, and Language Processing 2021 • Joanna Hong, Minsu Kim, Se Jin Park, Yong Man Ro

Our key contributions are: (1) proposing the Visual Voice memory that brings rich information of audio that complements the visual features, thus producing high-quality speech from silent video, and (2) enabling multi-speaker and unseen speaker training by memorizing auditory features and the corresponding visual features.

Ranked #1 on Speaker-Specific Lip to Speech Synthesis on GRID corpus (mixed-speech)

Speaker-Specific Lip to Speech Synthesis

Paper
Code

Visual Comfort Aware-Reinforcement Learning for Depth Adjustment of Stereoscopic 3D Images

no code implementations • 14 Apr 2021 • Hak Gu Kim, Minho Park, Sangmin Lee, Seongyeop Kim, Yong Man Ro

For a human expert, the depth adjustment procedure is a sequence of iterative decision making.

Decision Making reinforcement-learning +1

Paper
Add Code

Towards a Better Understanding of VR Sickness: Physical Symptom Prediction for VR Contents

no code implementations • 14 Apr 2021 • Hak Gu Kim, Sangmin Lee, Seongyeop Kim, Heoun-taek Lim, Yong Man Ro

To make better understanding of VR sickness, it is required to predict and provide the level of major symptoms of VR sickness rather than overall degree of VR sickness.

Paper
Add Code

Video Prediction Recalling Long-term Motion Context via Memory Alignment Learning

1 code implementation • CVPR 2021 • Sangmin Lee, Hak Gu Kim, Dae Hwi Choi, Hyung-Il Kim, Yong Man Ro

Our work addresses long-term motion context issues for predicting future frames.

Ranked #1 on Video Prediction on KTH (Cond metric)

Video Prediction

Paper
Code

Robust Small-Scale Pedestrian Detection With Cued Recall via Memory Learning

no code implementations • ICCV 2021 • Jung Uk Kim, Sungjune Park, Yong Man Ro

The purpose of the proposed large-scale embedding learning is to memorize and recall the large-scale pedestrian appearance via the LPR Memory.

Pedestrian Detection

Paper
Add Code

Towards Adversarial Robustness of Bayesian Neural Network through Hierarchical Variational Inference

1 code implementation • 1 Jan 2021 • Byung-Kwan Lee, Youngjoon Yu, Yong Man Ro

Recent works have applied Bayesian Neural Network (BNN) to adversarial training, and shown the improvement of adversarial robustness via the BNN's strength of stochastic gradient defense.

Adversarial Defense Adversarial Robustness +3

Paper
Code

Comprehensive Facial Expression Synthesis using Human-Interpretable Language

no code implementations • 16 Jul 2020 • Joanna Hong, Jung Uk Kim, Sangmin Lee, Yong Man Ro

Recent advances in facial expression synthesis have shown promising results using diverse expression representations including facial action units.

Paper
Add Code

Investigating Vulnerability to Adversarial Examples on Multimodal Data Fusion in Deep Learning

no code implementations • 22 May 2020 • Youngjoon Yu, Hong Joo Lee, Byeong Cheon Kim, Jung Uk Kim, Yong Man Ro

The success of multimodal data fusion in deep learning appears to be attributed to the use of complementary in-formation between multiple input data.

Adversarial Attack Adversarial Robustness +1

Paper
Add Code

Robust Ensemble Model Training via Random Layer Sampling Against Adversarial Attack

no code implementations • 21 May 2020 • Hakmin Lee, Hong Joo Lee, Seong Tae Kim, Yong Man Ro

After the ensemble models are trained, it can hide the gradient efficiently and avoid the gradient-based attack by the random layer sampling method.

Adversarial Attack Adversarial Robustness

Paper
Add Code

Revisiting Role of Autoencoders in Adversarial Settings

no code implementations • 21 May 2020 • Byeong Cheon Kim, Jung Uk Kim, Hakmin Lee, Yong Man Ro

Through the comprehensive experimental results and analysis, this paper presents the inherent property of adversarial robustness in the autoencoders.

Adversarial Defense Adversarial Robustness +1

Paper
Add Code

Efficient Ensemble Model Generation for Uncertainty Estimation with Bayesian Approximation in Segmentation

no code implementations • 21 May 2020 • Hong Joo Lee, Seong Tae Kim, Hakmin Lee, Nassir Navab, Yong Man Ro

Experimental results show that the proposed method could provide useful uncertainty information by Bayesian approximation with the efficient ensemble model generation and improve the predictive performance.

Segmentation

Paper
Add Code

Generative Guiding Block: Synthesizing Realistic Looking Variants Capable of Even Large Change Demands

no code implementations • 2 Jul 2019 • Minho Park, Hak Gu Kim, Yong Man Ro

Generating realistic looking images with large variations (e. g., large spatial deformations and large pose change), however, is very challenging.

Image Generation

Paper
Add Code

Generation of Multimodal Justification Using Visual Word Constraint Model for Explainable Computer-Aided Diagnosis

no code implementations • 10 Jun 2019 • Hyebin Lee, Seong Tae Kim, Yong Man Ro

The ambiguity of the decision-making process has been pointed out as the main obstacle to applying the deep learning-based method in a practical way in spite of its outstanding performance.

Decision Making Sentence

Paper
Add Code

Mode Variational LSTM Robust to Unseen Modes of Variation: Application to Facial Expression Recognition

no code implementations • 16 Nov 2018 • Wissam J. Baddar, Yong Man Ro

The effectiveness of the proposed mode variational LSTM is verified using the facial expression recognition task.

Facial Expression Recognition Facial Expression Recognition (FER)

Paper
Add Code

Feature2Mass: Visual Feature Processing in Latent Space for Realistic Labeled Mass Generation

no code implementations • 17 Sep 2018 • Jae-Hyeok Lee, Seong Tae Kim, Hakmin Lee, Yong Man Ro

In order to learn deep network model to be well-behaved in bio-image computing fields, a lot of labeled data is required.

Image Generation

Paper
Add Code

ICADx: Interpretable computer aided diagnosis of breast masses

no code implementations • 23 May 2018 • Seong Tae Kim, Hakmin Lee, Hak Gu Kim, Yong Man Ro

In this paper, we investigate interpretability in CADx with the proposed interpretable CADx (ICADx) framework.

Generative Adversarial Network

Paper
Add Code

STAN: Spatio-Temporal Adversarial Networks for Abnormal Event Detection

no code implementations • 23 Apr 2018 • Sangmin Lee, Hak Gu Kim, Yong Man Ro

In this paper, we propose a novel abnormal event detection method with spatio-temporal adversarial networks (STAN).

Ranked #17 on Anomaly Detection on ShanghaiTech

Anomaly Detection Event Detection

Paper
Add Code

VR IQA NET: Deep Virtual Reality Image Quality Assessment using Adversarial Learning

no code implementations • 11 Apr 2018 • Heoun-taek Lim, Hak Gu Kim, Yong Man Ro

The proposed human perception guider criticizes the predicted quality score of the predictor with the human perceptual score using adversarial learning.

Image Quality Assessment Position

Paper
Add Code

Measurement of exceptional motion in VR video contents for VR sickness assessment using deep convolutional autoencoder

no code implementations • 11 Apr 2018 • Hak Gu Kim, Wissam J. Baddar, Heoun-taek Lim, Hyunwook Jeong, Yong Man Ro

This paper proposes a new objective metric of exceptional motion in VR video contents for VR sickness assessment.

Paper
Add Code

Dynamics Transfer GAN: Generating Video by Transferring Arbitrary Temporal Dynamics from a Source Video to a Single Target Image

no code implementations • 10 Dec 2017 • Wissam J. Baddar, Geonmo Gu, Sangmin Lee, Yong Man Ro

The spatial constructs of a generated video sequence are acquired from the target image.

Paper
Add Code

Facial Dynamics Interpreter Network: What are the Important Relations between Local Dynamics for Facial Trait Estimation?

no code implementations • ECCV 2018 • Seong Tae Kim, Yong Man Ro

In this paper, a novel deep learning approach, named facial dynamics interpreter network, has been proposed to interpret the important relations between local dynamics for estimating facial traits from expression sequence.

Age Estimation Gender Classification +1

Paper
Add Code

Learning Spatio-temporal Features with Partial Expression Sequences for on-the-Fly Prediction

no code implementations • 29 Nov 2017 • Wissam J. Baddar, Yong Man Ro

At test time, most spatio-temporal encoding methods assume that a temporally segmented sequence is fed to a learned model, which could require the prediction to wait until the full sequence is available to an auxiliary task that performs the temporal segmentation.

Paper
Add Code

Differential Generative Adversarial Networks: Synthesizing Non-linear Facial Variations with Limited Number of Training Data

no code implementations • 28 Nov 2017 • Geonmo Gu, Seong Tae Kim, Kihyun Kim, Wissam J. Baddar, Yong Man Ro

through a generative model is helpful in addressing the lack of training data.

Face Generation

Paper
Add Code

Iterative Deep Convolutional Encoder-Decoder Network for Medical Image Segmentation

no code implementations • 11 Aug 2017 • Jung Uk Kim, Hak Gu Kim, Yong Man Ro

In this paper, we propose a novel medical image segmentation using iterative deep learning framework.

Decoder Image Segmentation +3

Paper
Add Code

Modality-bridge Transfer Learning for Medical Image Classification

no code implementations • 10 Aug 2017 • Hak Gu Kim, Yeoreum Choi, Yong Man Ro

This paper presents a new approach of transfer learning-based medical image classification to mitigate insufficient labeled data problem in medical domain.

General Classification Image Classification +2

Paper
Add Code

Convolution with Logarithmic Filter Groups for Efficient Shallow CNN

no code implementations • 31 Jul 2017 • Tae Kwan Lee, Wissam J. Baddar, Seong Tae Kim, Yong Man Ro

Our classification results on Multi-PIE dataset for facial expression recognition and CIFAR-10 dataset for object classification reveal that the compact CNN with the proposed logarithmic filter grouping scheme outperforms the same network with the uniform filter grouping in terms of accuracy and parameter efficiency.

Classification Facial Expression Recognition +2

Paper
Add Code

EvaluationNet: Can Human Skill be Evaluated by Deep Networks?

no code implementations • 31 May 2017 • Seong Tae Kim, Yong Man Ro

In order to improve the effectiveness of the learning with instructional video, observation and evaluation of the activity are required.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.