no code implementations • 19 Apr 2024 • Tao Chu, Pan Zhang, Xiaoyi Dong, Yuhang Zang, Qiong Liu, Jiaqi Wang
Existing approaches extract point clouds either from ground truth (GT) geometry or 3D scenes reconstructed by auxiliary models.
2 code implementations • 9 Apr 2024 • Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Songyang Zhang, Haodong Duan, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Zhe Chen, Xinyue Zhang, Wei Li, Jingwen Li, Wenhai Wang, Kai Chen, Conghui He, Xingcheng Zhang, Jifeng Dai, Yu Qiao, Dahua Lin, Jiaqi Wang
The Large Vision-Language Model (LVLM) field has seen significant advancements, yet its progression has been hindered by challenges in comprehending fine-grained visual content due to limited resolution.
Ranked #12 on Visual Question Answering on MM-Vet
1 code implementation • 29 Mar 2024 • Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Yuhang Zang, Zehui Chen, Haodong Duan, Jiaqi Wang, Yu Qiao, Dahua Lin, Feng Zhao
We evaluate 16 leading LVLMs on MMStar to assess their multi-modal capabilities, and on 7 benchmarks with the proposed metrics to investigate their data leakage and actual multi-modal gain.
1 code implementation • 26 Mar 2024 • Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, Xiaoyi Dong, Haodong Duan, Qi Fan, Zhaoye Fei, Yang Gao, Jiaye Ge, Chenya Gu, Yuzhe Gu, Tao Gui, Aijia Guo, Qipeng Guo, Conghui He, Yingfan Hu, Ting Huang, Tao Jiang, Penglong Jiao, Zhenjiang Jin, Zhikai Lei, Jiaxing Li, Jingwen Li, Linyang Li, Shuaibin Li, Wei Li, Yining Li, Hongwei Liu, Jiangning Liu, Jiawei Hong, Kaiwen Liu, Kuikun Liu, Xiaoran Liu, Chengqi Lv, Haijun Lv, Kai Lv, Li Ma, Runyuan Ma, Zerun Ma, Wenchang Ning, Linke Ouyang, Jiantao Qiu, Yuan Qu, FuKai Shang, Yunfan Shao, Demin Song, Zifan Song, Zhihao Sui, Peng Sun, Yu Sun, Huanze Tang, Bin Wang, Guoteng Wang, Jiaqi Wang, Jiayu Wang, Rui Wang, Yudong Wang, Ziyi Wang, Xingjian Wei, Qizhen Weng, Fan Wu, Yingtong Xiong, Chao Xu, Ruiliang Xu, Hang Yan, Yirong Yan, Xiaogui Yang, Haochen Ye, Huaiyuan Ying, JIA YU, Jing Yu, Yuhang Zang, Chuyu Zhang, Li Zhang, Pan Zhang, Peng Zhang, Ruijie Zhang, Shuo Zhang, Songyang Zhang, Wenjian Zhang, Wenwei Zhang, Xingcheng Zhang, Xinyue Zhang, Hui Zhao, Qian Zhao, Xiaomeng Zhao, Fengzhe Zhou, Zaida Zhou, Jingming Zhuo, Yicheng Zou, Xipeng Qiu, Yu Qiao, Dahua Lin
The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI).
Ranked #5 on Long-Context Understanding on Ada-LEval (BestAnswer)
1 code implementation • 22 Mar 2024 • Beichen Zhang, Pan Zhang, Xiaoyi Dong, Yuhang Zang, Jiaqi Wang
Contrastive Language-Image Pre-training (CLIP) has been the cornerstone for zero-shot classification, text-image retrieval, and text-image generation by aligning image and text modalities.
1 code implementation • 20 Mar 2024 • Ziyu Liu, Zeyi Sun, Yuhang Zang, Wei Li, Pan Zhang, Xiaoyi Dong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang
Notably, our approach demonstrates a significant improvement in performance on 5 fine-grained visual recognition benchmarks, 11 few-shot image recognition datasets, and the 2 object detection datasets under the zero-shot recognition setting.
1 code implementation • 27 Feb 2024 • Shuangrui Ding, Zihan Liu, Xiaoyi Dong, Pan Zhang, Rui Qian, Conghui He, Dahua Lin, Jiaqi Wang
We present SongComposer, an innovative LLM designed for song composition.
1 code implementation • 22 Feb 2024 • Yuhang Cao, Pan Zhang, Xiaoyi Dong, Dahua Lin, Jiaqi Wang
We present DualFocus, a novel framework for integrating macro and micro perspectives within multi-modal large language models (MLLMs) to enhance vision-language task performance.
1 code implementation • 29 Jan 2024 • Xiaoyi Dong, Pan Zhang, Yuhang Zang, Yuhang Cao, Bin Wang, Linke Ouyang, Xilin Wei, Songyang Zhang, Haodong Duan, Maosong Cao, Wenwei Zhang, Yining Li, Hang Yan, Yang Gao, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang
We introduce InternLM-XComposer2, a cutting-edge vision-language model excelling in free-form text-image composition and comprehension.
Ranked #17 on Visual Question Answering on MM-Vet
no code implementations • 22 Jan 2024 • Fengyang Xiao, Pan Zhang, Chunming He, Runze Hu, Yutao Liu
Concealed object segmentation (COS) is a challenging task that involves localizing and segmenting those concealed objects that are visually blended with their surrounding environments.
no code implementations • 7 Dec 2023 • Tong Wu, Zhibing Li, Shuai Yang, Pan Zhang, Xinggang Pan, Jiaqi Wang, Dahua Lin, Ziwei Liu
Extensive experiments demonstrate the effectiveness of HyperDreamer in modeling region-aware materials with high-resolution textures and enabling user-friendly editing.
1 code implementation • 6 Dec 2023 • Zeyi Sun, Ye Fang, Tong Wu, Pan Zhang, Yuhang Zang, Shu Kong, Yuanjun Xiong, Dahua Lin, Jiaqi Wang
Alpha-CLIP not only preserves the visual recognition ability of CLIP but also enables precise control over the emphasis of image contents.
1 code implementation • 29 Nov 2023 • Qidong Huang, Xiaoyi Dong, Pan Zhang, Bin Wang, Conghui He, Jiaqi Wang, Dahua Lin, Weiming Zhang, Nenghai Yu
Based on the observation, OPERA introduces a penalty term on the model logits during the beam-search decoding to mitigate the over-trust issue, along with a rollback strategy that retrospects the presence of summary tokens in the previously generated tokens, and re-allocate the token selection if necessary.
1 code implementation • 21 Nov 2023 • Lin Chen, Jinsong Li, Xiaoyi Dong, Pan Zhang, Conghui He, Jiaqi Wang, Feng Zhao, Dahua Lin
In the realm of large multi-modal models (LMMs), efficient modality alignment is crucial yet often constrained by the scarcity of high-quality image-text data.
Ranked #2 on visual instruction following on LLaVA-Bench
1 code implementation • 26 Sep 2023 • Pan Zhang, Xiaoyi Dong, Bin Wang, Yuhang Cao, Chao Xu, Linke Ouyang, Zhiyuan Zhao, Haodong Duan, Songyang Zhang, Shuangrui Ding, Wenwei Zhang, Hang Yan, Xinyue Zhang, Wei Li, Jingwen Li, Kai Chen, Conghui He, Xingcheng Zhang, Yu Qiao, Dahua Lin, Jiaqi Wang
We propose InternLM-XComposer, a vision-language large model that enables advanced image-text comprehension and composition.
Ranked #9 on Visual Question Answering (VQA) on InfiMM-Eval
1 code implementation • 25 Aug 2023 • Zhiyuan Zhao, Linke Ouyang, Bin Wang, Siyuan Huang, Pan Zhang, Xiaoyi Dong, Jiaqi Wang, Conghui He
Despite the great advance of Multimodal Large Language Models (MLLMs) in both instruction dataset building and benchmarking, the independence of training and evaluation makes current MLLMs hard to further improve their capability under the guidance of evaluation results with a relatively low human cost.
2 code implementations • 24 Aug 2023 • Bin Wang, Fan Wu, Xiao Han, Jiahui Peng, Huaping Zhong, Pan Zhang, Xiaoyi Dong, Weijia Li, Wei Li, Jiaqi Wang, Conghui He
A practical solution to this problem would be to utilize the available multimodal large language models (MLLMs) to generate instruction data for vision-language tasks.
no code implementations • 10 Aug 2023 • Guozhang Liu, Baochai Peng, Ting Liu, Pan Zhang, Mengke Yuan, Chaoran Lu, Ningning Cao, Sen Zhang, Simin Huang, Tao Wang
The diversity of building architecture styles of global cities situated on various landforms, the degraded optical imagery affected by clouds and shadows, and the significant inter-class imbalance of roof types pose challenges for designing a robust and accurate building roof instance segmentor.
no code implementations • 10 Aug 2023 • Chaoran Lu, Ningning Cao, Pan Zhang, Ting Liu, Baochai Peng, Guozhang Liu, Mengke Yuan, Sen Zhang, Simin Huang, Tao Wang
Unifying the correlative single-view satellite image building extraction and height estimation tasks indicates a promising way to share representations and acquire generalist model for large-scale urban 3D reconstruction.
1 code implementation • 18 Jul 2023 • Hanyan Cao, Feng Pan, Yijia Wang, Pan Zhang
Our framework is general and can be applied to any error model and quantum codes with different topologies such as surface codes and quantum LDPC codes.
1 code implementation • 10 Jul 2023 • Pengyang Ling, Lin Chen, Pan Zhang, Huaian Chen, Yi Jin, Jinjin Zheng
To serve the intricate and varied demands of image editing, precise and flexible manipulation in image content is indispensable.
1 code implementation • CVPR 2023 • Tao Chu, Pan Zhang, Qiong Liu, Jiaqi Wang
The 3D voxels are then refined and grouped into 3D instances according to the predicted 2D instance centers.
no code implementations • ICCV 2023 • Jiaqi Wang, Pan Zhang, Tao Chu, Yuhang Cao, Yujie Zhou, Tong Wu, Bin Wang, Conghui He, Dahua Lin
2) Hierarchical Category Organization: The vast vocabulary of V3Det is organized by a hierarchical category tree which annotates the inclusion relationship among categories, encouraging the exploration of category relationships in vast and open vocabulary object detection.
1 code implementation • CVPR 2023 • BoWen Zhang, Chenyang Qi, Pan Zhang, Bo Zhang, HsiangTao Wu, Dong Chen, Qifeng Chen, Yong Wang, Fang Wen
In this work, we propose an ID-preserving talking head generation framework, which advances previous methods in two aspects.
1 code implementation • 29 Sep 2022 • Ying Tang, Jiayu Weng, Pan Zhang
The stochastic reaction network in which chemical species evolve through a set of reactions is widely used to model stochastic processes in physics, chemistry and biology.
1 code implementation • 25 Apr 2022 • Hao Ouyang, Bo Zhang, Pan Zhang, Hao Yang, Jiaolong Yang, Dong Chen, Qifeng Chen, Fang Wen
We propose pose-guided multiplane image (MPI) synthesis which can render an animatable character in real scenes with photorealistic quality.
no code implementations • 29 Mar 2022 • Pan Zhang, Jianmin Bao, Ting Zhang, Dong Chen, Fang Wen
Thanks to the low dimensional feature space, it is easier to find the desired mapping function, resulting in improved quality of translation results as well as the stability of the translation model.
1 code implementation • 24 Jun 2021 • Jing Liu, Sujie Li, Jiang Zhang, Pan Zhang
Despite the great potential, however, existing tensor network models for unsupervised machine learning only work as a proof of principle, as their performance is much worse than the standard models such as restricted Boltzmann machines and neural networks.
no code implementations • 1 Jun 2021 • Pan Zhang, Bo Zhang, Ting Zhang, Dong Chen, Fang Wen
The proposed robust mutual learning demonstrates state-of-the-art performance on semantic segmentation in low-data regime.
no code implementations • 10 May 2021 • Sujie Li, Feng Pan, Pengfei Zhou, Pan Zhang
Using numerical experiments, we demonstrate that the proposed algorithm is much more accurate than the state-of-the-art machine learning methods in estimating the partition function of restricted Boltzmann machines and deep Boltzmann machines, and have potential applications in training deep Boltzmann machines for general machine learning tasks.
2 code implementations • CVPR 2021 • Pan Zhang, Bo Zhang, Ting Zhang, Dong Chen, Yong Wang, Fang Wen
In this paper, we rely on representative prototypes, the feature centroids of classes, to address the two issues for unsupervised domain adaptation.
Ranked #10 on Semantic Segmentation on GTAV-to-Cityscapes Labels
1 code implementation • CVPR 2021 • Xingran Zhou, Bo Zhang, Ting Zhang, Pan Zhang, Jianmin Bao, Dong Chen, Zhongfei Zhang, Fang Wen
We present the full-resolution correspondence learning for cross-domain images, which aids image translation.
8 code implementations • 14 Sep 2020 • Zi-Yu Wan, Bo Zhang, Dong-Dong Chen, Pan Zhang, Dong Chen, Jing Liao, Fang Wen
Unlike conventional restoration tasks that can be solved through supervised learning, the degradation in real photos is complex and the domain gap between synthetic images and real old photos makes the network fail to generalize.
no code implementations • 12 Sep 2020 • Song Cheng, Lei Wang, Pan Zhang
Tensor networks, a model that originated from quantum physics, has been gradually generalized as efficient models in machine learning in recent years.
no code implementations • 18 Aug 2020 • Pan Zhang, Wilfredo Torres Calderon, Bokyung Lee, Alex Tessier, Jacky Bibliowicz, Liviu Calin, Michael Lee
Instead of doing 3D scene reconstruction or transfer learning from deep networks, a mapping from the surface in the two camera views to the surface space is the only requirement.
1 code implementation • 16 Aug 2020 • Jin-Guo Liu, Lei Wang, Pan Zhang
We present a unified exact tensor network approach to compute the ground state energy, identify the optimal configuration, and count the number of solutions for spin glasses.
Statistical Mechanics Quantum Physics Computation
7 code implementations • CVPR 2020 • Zi-Yu Wan, Bo Zhang, Dong-Dong Chen, Pan Zhang, Dong Chen, Jing Liao, Fang Wen
Unlike conventional restoration tasks that can be solved through supervised learning, the degradation in real photos is complex and the domain gap between synthetic images and real old photos makes the network fail to generalize.
3 code implementations • CVPR 2020 • Pan Zhang, Bo Zhang, Dong Chen, Lu Yuan, Fang Wen
The output has the style (e. g., color, texture) in consistency with the semantically corresponding objects in the exemplar.
Ranked #1 on Image-to-Image Translation on ADE20K-Outdoor Labels-to-Photos (FID metric)
1 code implementation • 24 Dec 2019 • Jin-Guo Liu, Liang Mao, Pan Zhang, Lei Wang
We extend the ability of unitary quantum circuits by interfacing it with classical autoregressive neural networks.
Quantum Physics
1 code implementation • 23 Dec 2019 • Xiu-Zhe Luo, Jin-Guo Liu, Pan Zhang, Lei Wang
We introduce Yao, an extensible, efficient open-source framework for quantum algorithm design.
Quantum Physics Strongly Correlated Electrons Computational Physics
1 code implementation • 6 Dec 2019 • Feng Pan, Pengfei Zhou, Sujie Li, Pan Zhang
We present a general method for approximately contracting tensor networks with an arbitrary connectivity.
Computational Physics Statistical Mechanics Strongly Correlated Electrons Quantum Physics
no code implementations • 1 Nov 2019 • Pengfei Zhou, Tianyi Li, Pan Zhang
For the first time, well-controlled benchmark datasets with asymptotially exact properties and optimal solutions could be produced for the evaluation of graph convolution neural networks, and for the theoretical understanding of their strengths and weaknesses.
no code implementations • 26 Jun 2019 • Feng Pan, Pengfei Zhou, Hai-Jun Zhou, Pan Zhang
We propose a method for solving statistical mechanics problems defined on sparse graphs.
no code implementations • 12 Apr 2019 • Alastair Gregory, Din-Houn Lau, Alex Tessier, Pan Zhang
An increasing amount of civil engineering applications are utilising data acquired from infrastructure instrumented with sensing devices.
no code implementations • 8 Jan 2019 • Song Cheng, Lei Wang, Tao Xiang, Pan Zhang
Matrix product states (MPS), a tensor network designed for one-dimensional quantum systems, has been recently proposed for generative modeling of natural data (such as images) in terms of `Born machine'.
no code implementations • 13 Dec 2018 • Zhuan Li, Pan Zhang
Matrix Product States (MPS), also known as Tensor Train (TT) decomposition in mathematics, has been proposed originally for describing an (especially one-dimensional) quantum system, and recently has found applications in various applications such as compressing high-dimensional data, supervised kernel linear classifier, and unsupervised generative modeling.
2 code implementations • 27 Sep 2018 • Dian Wu, Lei Wang, Pan Zhang
We propose a general framework for solving statistical mechanics of systems with finite size.
no code implementations • 30 Jan 2018 • Cheng Shi, Yanchen Liu, Pan Zhang
In the community detection problem in weighted and directed networks, we show that our algorithm significantly outperforms existing algorithms.
no code implementations • 4 Oct 2017 • Pan Zhang
There have been several spectral bounds for the percolation transition in networks, using spectrum of matrices associated with the network such as the adjacency matrix and the non-backtracking matrix.
1 code implementation • 6 Sep 2017 • Zhao-Yu Han, Jun Wang, Heng Fan, Lei Wang, Pan Zhang
Generative modeling, which learns joint probability distribution from data and generates samples according to it, is an important task in machine learning and artificial intelligence.
no code implementations • NeurIPS 2016 • Pan Zhang
Spectral methods are popular in detecting global structures in the given data that can be represented as a matrix.
no code implementations • 19 Jun 2015 • Amir Ghasemian, Pan Zhang, Aaron Clauset, Cristopher Moore, Leto Peel
We study the fundamental limits on learning latent community structure in dynamic networks.
no code implementations • 15 Jan 2015 • Pan Zhang
The Normalized Mutual Information (NMI) has been widely used to evaluate the accuracy of community detection algorithms.
no code implementations • 30 Apr 2014 • Pan Zhang, Cristopher Moore, Lenka Zdeborová
For larger $k$ where a hard but detectable regime exists, we find that the easy/hard transition (the point at which efficient algorithms can do better than chance) becomes a line of transitions where the accuracy jumps discontinuously at a critical value of $\alpha$.
1 code implementation • 23 Mar 2014 • Pan Zhang, Cristopher Moore
We address this problem by using the modularity as a Hamiltonian at finite temperature, and using an efficient Belief Propagation algorithm to obtain the consensus of many partitions with high modularity, rather than looking for a single partition that maximizes it.
no code implementations • 24 Jun 2013 • Florent Krzakala, Cristopher Moore, Elchanan Mossel, Joe Neeman, Allan Sly, Lenka Zdeborová, Pan Zhang
Spectral algorithms are classic approaches to clustering and community detection in networks.
no code implementations • 17 Jul 2012 • Xiaoran Yan, Cosma Rohilla Shalizi, Jacob E. Jensen, Florent Krzakala, Cristopher Moore, Lenka Zdeborova, Pan Zhang, Yaojia Zhu
We present the first principled and tractable approach to model selection between standard and degree-corrected block models, based on new large-graph asymptotics for the distribution of log-likelihood ratios under the stochastic block model, finding substantial departures from classical results for sparse graphs.