|MODEL DISTILLATION|AI|LARGE LANGUAGE MODELS|
Distilling the knowledge of a large model is complex but a new method shows incredible performances
Large language models (LLMs) and few-shot learning have shown we can use these models for unseen tasks. However, these skills have a cost: a huge number of parameters. This means you need also a specialized infrastructure and restrict state-of-the-art LLMs to only a few companies and research teams.
- Do we really need a unique model for each task?
- Would it be possible to create specialized models that could replace them for specific applications?
- How can we have a small model that competes with giant LLMs for specific applications? Do we necessarily need a lot of data?
In this article, I give an answer to these questions.
“Education is the key to success in life, and teachers make a lasting impact in the lives of their students.” –Solomon Ortiz
The art of teaching is the art of assisting discovery. — Mark Van Doren
Large language models (LLMs) have shown revolutionary capabilities. For example, researchers have been surprised by elusive behavior such as in-context learning. This has led to an increase in the scale of models, with larger and larger models searching for new capabilities that appear beyond a number of parameters.