no code implementations • 22 Apr 2024 • Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra, Xiyang Dai, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Victor Fragoso, Dan Iter, Mei Gao, Min Gao, Jianfeng Gao, Amit Garg, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Jamie Huynh, Mojan Javaheripi, Xin Jin, Piero Kauffmann, Nikos Karampatziakis, Dongwoo Kim, Mahoud Khademi, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Yunsheng Li, Chen Liang, Lars Liden, Ce Liu, Mengchen Liu, Weishung Liu, Eric Lin, Zeqi Lin, Chong Luo, Piyush Madan, Matt Mazzola, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Swadheen Shukla, Xia Song, Masahiro Tanaka, Andrea Tupini, Xin Wang, Lijuan Wang, Chunyu Wang, Yu Wang, Rachel Ward, Guanhua Wang, Philipp Witte, Haiping Wu, Michael Wyatt, Bin Xiao, Can Xu, Jiahang Xu, Weijian Xu, Sonali Yadav, Fan Yang, Jianwei Yang, ZiYi Yang, Yifan Yang, Donghan Yu, Lu Yuan, Chengruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou
We introduce phi-3-mini, a 3. 8 billion parameter language model trained on 3. 3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3. 5 (e. g., phi-3-mini achieves 69% on MMLU and 8. 38 on MT-bench), despite being small enough to be deployed on a phone.
no code implementations • 1 Dec 2023 • Deepak Sridhar, Yunsheng Li, Nuno Vasconcelos
The resulting $\textit{Scalable CHannEl MixEr}$ (SCHEME) can be plugged into any ViT architecture to obtain a gamut of models with different trade-offs between complexity and performance by controlling the block diagonal MLP structure.
1 code implementation • 27 Nov 2023 • Chongyan Chen, Mengchen Liu, Noel Codella, Yunsheng Li, Lu Yuan, Danna Gurari
Visual Question Answering (VQA) entails answering questions about images.
no code implementations • CVPR 2023 • Zhiyuan Hu, Yunsheng Li, Jiancheng Lyu, Dashan Gao, Nuno Vasconcelos
This is accomplished by the introduction of dense connections between the intermediate layers of the task expert networks, that enable the transfer of knowledge from old to new tasks via feature sharing and reusing.
1 code implementation • 7 Jul 2022 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Pei Yu, Jing Yin, Lu Yuan, Zicheng Liu, Nuno Vasconcelos
We formulate this as a learning problem where the goal is to assign operators to proposals, in the detection head, so that the total computational cost is constrained and the precision is maximized.
1 code implementation • ICCV 2021 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, Nuno Vasconcelos
This paper aims at addressing the problem of substantial performance degradation at extremely low computational cost (e. g. 5M FLOPs on ImageNet classification).
1 code implementation • CVPR 2021 • Yunsheng Li, Lu Yuan, Yinpeng Chen, Pei Wang, Nuno Vasconcelos
However, such a static model is difficult to handle conflicts across multiple domains, and suffers from a performance degradation in both source domains and target domain.
1 code implementation • ICLR 2021 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Mengchen Liu, Dongdong Chen, Ye Yu, Lu Yuan, Zicheng Liu, Mei Chen, Nuno Vasconcelos
It has two limitations: (a) it increases the number of convolutional weights by K-times, and (b) the joint optimization of dynamic attention and static convolution kernels is challenging.
no code implementations • 24 Nov 2020 • Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, Nuno Vasconcelos
In this paper, we present MicroNet, which is an efficient convolutional neural network using extremely low computational cost (e. g. 6 MFLOPs on ImageNet classification).
no code implementations • 27 Jul 2020 • Pedro Morgado, Yunsheng Li, Jose Costa Pereira, Mohammad Saberian, Nuno Vasconcelos
The use of a fixed set of proxies (weights of the CNN classification layer) is proposed to eliminate this ambiguity, and a procedure to design proxy sets that are nearly optimal for both classification and hashing is introduced.
1 code implementation • CVPR 2020 • Yiran Xu, Xiaoyin Yang, Lihang Gong, Hsuan-Chu Lin, Tz-Ying Wu, Yunsheng Li, Nuno Vasconcelos
The new paradigm lies between the end-to-end and pipelined approaches, and is inspired by how humans solve the problem.
no code implementations • 24 Jun 2019 • Yunsheng Li, Nuno Vasconcelos
The problem of multi-domain learning of deep networks is considered.
no code implementations • CVPR 2019 • Yunsheng Li, Nuno Vasconcelos
The problem of multi-domain learning of deep networks is considered.
no code implementations • 27 May 2019 • Mandar Dixit, Yunsheng Li, Nuno Vasconcelos
Somewhat surprisingly, the scene classification results are superior to those of a CNN explicitly trained for scene classification, using a large scene dataset (Places).
3 code implementations • CVPR 2019 • Yunsheng Li, Lu Yuan, Nuno Vasconcelos
In this paper, we propose a novel bidirectional learning framework for domain adaptation of segmentation.
Ranked #7 on Semantic Segmentation on DADA-seg
no code implementations • ICCV 2017 • Yunsheng Li, Mandar Dixit, Nuno Vasconcelos
This enables the design of a network architecture, the MFAFVNet, that can be trained in an end to end manner.