Continual Pretraining
22 papers with code • 3 benchmarks • 3 datasets
Libraries
Use these libraries to find Continual Pretraining models and implementationsMost implemented papers
Continual Training of Language Models for Few-Shot Learning
Recent work on applying large language models (LMs) achieves impressive performance in many NLP applications.
Rho-1: Not All Tokens Are What You Need
After fine-tuning, Rho-1-1B and 7B achieved state-of-the-art results of 40. 6% and 51. 8% on MATH dataset, respectively - matching DeepSeekMath with only 3% of the pretraining tokens.
ECONET: Effective Continual Pretraining of Language Models for Event Temporal Reasoning
While pre-trained language models (PTLMs) have achieved noticeable success on many NLP tasks, they still struggle for tasks that require event temporal reasoning, which is essential for event-centric applications.
Continual Pre-training of Language Models
A novel proxy is also proposed to preserve the general knowledge in the original LM.
Towards Geospatial Foundation Models via Continual Pretraining
Geospatial technologies are becoming increasingly essential in our world for a wide range of applications, including agriculture, urban planning, and disaster response.
Autonomous Data Selection with Language Models for Mathematical Texts
Our method showcases a 2 times increase in pretraining token efficiency compared to state-of-the-art baselines, underscoring the potential of our approach in enhancing models' mathematical reasoning capabilities.
Data Engineering for Scaling Language Models to 128K Context
We demonstrate that continual pretraining of the full model on 1B-5B tokens of such data is an effective and affordable strategy for scaling the context length of language models to 128K.
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing
In this paper, we challenge this assumption by showing that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models.
Efficient Contrastive Learning via Novel Data Augmentation and Curriculum Learning
We introduce EfficientCL, a memory-efficient continual pretraining method that applies contrastive learning with novel data augmentation and curriculum learning.
On the Robustness of Reading Comprehension Models to Entity Renaming
We study the robustness of machine reading comprehension (MRC) models to entity renaming -- do models make more wrong predictions when the same questions are asked about an entity whose name has been changed?