Search Results for author: Bang Yang

Found 16 papers, 7 papers with code

VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework

no code implementations • 14 Mar 2024 • Chris Kelly, Luhui Hu, Bang Yang, Yu Tian, Deshun Yang, Cindy Yang, Zaoshan Huang, Zihao Li, Jiayin Hu, Yuexian Zou

With the emergence of large language models (LLMs) and vision foundation models, how to combine the intelligence and capacity of these open-sourced or API-available models to achieve open-world visual perception remains an open question.

Language Modelling Large Language Model +2

Paper
Add Code

VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding

no code implementations • 14 Mar 2024 • Chris Kelly, Luhui Hu, Jiayin Hu, Yu Tian, Deshun Yang, Bang Yang, Cindy Yang, Zihao Li, Zaoshan Huang, Yuexian Zou

It seamlessly integrates various SOTA vision models and brings the automation in the selection of SOTA vision models, identifies the suitable 3D mesh creation algorithms corresponding to 2D depth maps analysis, generates optimal results based on diverse multimodal inputs such as text prompts.

Paper
Add Code

WorldGPT: A Sora-Inspired Video AI Agent as Rich World Models from Text and Image Inputs

no code implementations • 10 Mar 2024 • Deshun Yang, Luhui Hu, Yu Tian, Zihao Li, Chris Kelly, Bang Yang, Cindy Yang, Yuexian Zou

Several text-to-video diffusion models have demonstrated commendable capabilities in synthesizing high-quality video content.

Video Generation

Paper
Add Code

Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning

1 code implementation • 30 Jan 2024 • Bang Yang, Yong Dai, Xuxin Cheng, Yaowei Li, Asif Raza, Yuexian Zou

To alleviate CF raised by covariate shift and lexical overlap, we further propose a novel approach that ensures the identical distribution of all token embeddings during initialization and regularizes token embedding learning during training.

Text Retrieval

Paper
Code

Improving Medical Report Generation with Adapter Tuning and Knowledge Enhancement in Vision-Language Foundation Models

no code implementations • 7 Dec 2023 • Shibin Wu, Bang Yang, Zhiyu Ye, Haoqian Wang, Hairong Zheng, Tong Zhang

Medical report generation demands automatic creation of coherent and precise descriptions for medical images.

Domain Adaptation Medical Report Generation

Paper
Add Code

UnifiedVisionGPT: Streamlining Vision-Oriented AI through Generalized Multimodal Framework

1 code implementation • 16 Nov 2023 • Chris Kelly, Luhui Hu, Cindy Yang, Yu Tian, Deshun Yang, Bang Yang, Zaoshan Huang, Zihao Li, Yuexian Zou

In the current landscape of artificial intelligence, foundation models serve as the bedrock for advancements in both language and vision domains.

Paper
Code

MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning

1 code implementation • 25 Aug 2023 • Bang Yang, Fenglin Liu, Xian Wu, YaoWei Wang, Xu sun, Yuexian Zou

To deal with the label shortage problem, we present a simple yet effective zero-shot approach MultiCapCLIP that can generate visual captions for different scenarios and languages without any labeled vision-caption pairs of downstream datasets.

Image Captioning Video Captioning

Paper
Code

Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels

no code implementations • 5 Jul 2023 • Bang Yang, Fenglin Liu, Zheng Li, Qingyu Yin, Chenyu You, Bing Yin, Yuexian Zou

We observe that the core challenges of novel product title generation are the understanding of novel product characteristics and the generation of titles in a novel writing style.

Image Captioning Text Generation

Paper
Add Code

Customizing General-Purpose Foundation Models for Medical Report Generation

no code implementations • 9 Jun 2023 • Bang Yang, Asif Raza, Yuexian Zou, Tong Zhang

In this work, we propose customizing off-the-shelf general-purpose large-scale pre-trained models, i. e., foundation models (FMs), in computer vision and natural language processing with a specific focus on medical report generation.

Medical Report Generation Transfer Learning

Paper
Add Code

Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation

no code implementations • ICCV 2023 • Yaowei Li, Bang Yang, Xuxin Cheng, Zhihong Zhu, Hongxiang Li, Yuexian Zou

Automatic radiology report generation has attracted enormous research interest due to its practical value in reducing the workload of radiologists.

Sentence

Paper
Add Code

ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation

1 code implementation • 11 Mar 2023 • Bang Yang, Fenglin Liu, Yuexian Zou, Xian Wu, YaoWei Wang, David A. Clifton

We present the results of extensive experiments on twelve NLG tasks, showing that, without using any labeled downstream pairs for training, ZeroNLG generates high-quality and believable outputs and significantly outperforms existing zero-shot methods.

Image Captioning Machine Translation +5

Paper
Code

Generating Accurate and Faithful Discharge Instructions: Task, Dataset, and Model

2 code implementations • 23 Oct 2022 • Fenglin Liu, Bang Yang, Chenyu You, Xian Wu, Shen Ge, Zhangdaihong Liu, Xu sun, Yang Yang, David A. Clifton

We build a benchmark clinical dataset and propose the Re3Writer, which imitates the working patterns of physicians to first retrieve related working experience from historical PIs written by physicians, then reason related medical knowledge.

Paper
Code

Graph-in-Graph Network for Automatic Gene Ontology Description Generation

no code implementations • 10 Jun 2022 • Fenglin Liu, Bang Yang, Chenyu You, Xian Wu, Shen Ge, Adelaide Woicik, Sheng Wang

This task aims to automatically generate a sentence that describes the function of a GO term belonging to one of the three categories, i. e., molecular function, biological process, and cellular component.

Sentence

Paper
Add Code

CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter

1 code implementation • 30 Nov 2021 • Bang Yang, Tong Zhang, Yuexian Zou

DCD is an auxiliary task that requires a caption model to learn the correspondence between video content and concepts and the co-occurrence relations between concepts.

Ranked #16 on Video Captioning on MSR-VTT

Caption Generation Representation Learning +1

Paper
Code

O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning

no code implementations • Findings (ACL) 2021 • Fenglin Liu, Xuancheng Ren, Xian Wu, Bang Yang, Shen Ge, Yuexian Zou, Xu sun

Video captioning combines video understanding and language generation.

Attribute Caption Generation +4

Paper
Add Code

Non-Autoregressive Coarse-to-Fine Video Captioning

1 code implementation • 27 Nov 2019 • Bang Yang, Yuexian Zou, Fenglin Liu, Can Zhang

However, mainstream video captioning methods suffer from slow inference speed due to the sequential manner of autoregressive decoding, and prefer generating generic descriptions due to the insufficient training of visual words (e. g., nouns and verbs) and inadequate decoding paradigm.

Sentence Video Captioning

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.