Search Results for author: Kezhen Chen

Found 10 papers, 3 papers with code

NICE: Neural Image Commenting with Empathy

no code implementations • Findings (EMNLP) 2021 • Kezhen Chen, Qiuyuan Huang, Daniel McDuff, Xiang Gao, Hamid Palangi, JianFeng Wang, Kenneth Forbus, Jianfeng Gao

Based on these annotations, we define two different tasks for the NICE dataset.

Paper
Add Code

Dragonfly: Multi-Resolution Zoom Supercharges Large Visual-Language Model

1 code implementation • 3 Jun 2024 • Kezhen Chen, Rahul Thapa, Rahul Chalamala, Ben Athiwaratkun, Shuaiwen Leon Song, James Zou

Recent advances in large multimodal models (LMMs) suggest that higher image resolution enhances the fine-grained understanding of image details, crucial for tasks such as visual commonsense reasoning and analyzing biomedical images.

Image Captioning Language Modelling +2

Paper
Code

IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues

no code implementations • 15 May 2024 • Diji Yang, Jinmeng Rao, Kezhen Chen, Xiaoyuan Guo, Yawen Zhang, Jie Yang, Yi Zhang

Although the Retrieval-Augmented Generation (RAG) paradigms can use external knowledge to enhance and ground the outputs of Large Language Models (LLMs) to mitigate generative hallucinations and static knowledge base problems, they still suffer from limited flexibility in adopting Information Retrieval (IR) systems with varying capabilities, constrained interpretability during the multi-round retrieval process, and a lack of end-to-end optimization.

Information Retrieval Question Answering +2

Paper
Add Code

Higher Layers Need More LoRA Experts

1 code implementation • 13 Feb 2024 • Chongyang Gao, Kezhen Chen, Jinmeng Rao, Baochen Sun, Ruibo Liu, Daiyi Peng, Yawen Zhang, Xiaoyuan Guo, Jie Yang, VS Subrahmanian

In this paper, we introduce a novel parameter-efficient MoE method, \textit{\textbf{M}oE-L\textbf{o}RA with \textbf{L}ayer-wise Expert \textbf{A}llocation (MoLA)} for Transformer-based models, where each model layer has the flexibility to employ a varying number of LoRA experts.

Paper
Code

Evaluation and Enhancement of Semantic Grounding in Large Vision-Language Models

no code implementations • 7 Sep 2023 • Jiaying Lu, Jinmeng Rao, Kezhen Chen, Xiaoyuan Guo, Yawen Zhang, Baochen Sun, Carl Yang, Jie Yang

Large Vision-Language Models (LVLMs) offer remarkable benefits for a variety of vision-language tasks.

Question Answering Visual Question Answering

Paper
Add Code

Tackling Vision Language Tasks Through Learning Inner Monologues

no code implementations • 19 Aug 2023 • Diji Yang, Kezhen Chen, Jinmeng Rao, Xiaoyuan Guo, Yawen Zhang, Jie Yang, Yi Zhang

Visual language tasks require AI models to comprehend and reason with both visual and textual content.

Paper
Add Code

LOWA: Localize Objects in the Wild with Attributes

no code implementations • 31 May 2023 • Xiaoyuan Guo, Kezhen Chen, Jinmeng Rao, Yawen Zhang, Baochen Sun, Jie Yang

To train LOWA, we propose a hybrid vision-language training strategy to learn object detection and recognition with class names as well as attribute information.

Attribute Object +3

Paper
Add Code

Mapping Natural-language Problems to Formal-language Solutions Using Structured Neural Representations

2 code implementations • ICML 2020 • Kezhen Chen, Qiuyuan Huang, Hamid Palangi, Paul Smolensky, Kenneth D. Forbus, Jianfeng Gao

The encoder of TP-N2F employs TPR `binding' to encode natural-language symbolic structure in vector space and the decoder uses TPR `unbinding' to generate, in symbolic space, a sequential program represented by relational tuples, each consisting of a relation (or operation) and a number of arguments.

Decoder Program Synthesis +1

Paper
Code

Natural- to formal-language generation using Tensor Product Representations

no code implementations • 25 Sep 2019 • Kezhen Chen, Qiuyuan Huang, Hamid Palangi, Paul Smolensky, Kenneth D. Forbus, Jianfeng Gao

Generating formal-language represented by relational tuples, such as Lisp programs or mathematical expressions, from a natural-language input is an extremely challenging task because it requires to explicitly capture discrete symbolic structural information from the input to generate the output.

Decoder Math +2

Paper
Add Code

Who are the Devils Wearing Prada in New York City?

no code implementations • 19 Aug 2015 • Kuan-Ting Chen, Kezhen Chen, Peizhong Cong, Winston H. Hsu, Jiebo Luo

To answer this question, we design a novel system that consists of three major components: (1) constructing a large dataset from the New York Fashion Shows and New York street chic in order to understand the likely clothing fashion trends in New York, (2) utilizing a learning-based approach to discover fashion attributes as the representative characteristics of fashion trends, and (3) comparing the analysis results from the New York Fashion Shows and street-chic images to verify whether the fashion shows have actual influence on the people in New York City.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.