The History of Open-Source LLMs: Better Base Models (Part Two) | by Cameron R. Wolfe, Ph.D. | Nov, 2023

How LLaMA, MPT, Falcon, and LLaMA-2 put LLMs on the map…

Cameron R. Wolfe, Ph.D.
Towards Data Science
(Photo by Iñaki del Olmo on Unsplash)

on (LLMs) is incredibly valuable, as it aims to democratize a powerful and influential . Although open-source LLMs are now commonly used and widely studied, this area of research saw some initial struggles that were difficult to overcome. Namely, open-source LLMs performed poorly at first and were heavily criticized. Within this overview, we will study a line of research that changed this narrative by making high-performing pre-trained LLMs available to everyone. Given that pre- a language model is so expensive, the models we will study here are especially impactful. After these high-performing base models were created and released, many people could conduct research using these models at marginal added cost.

“The capabilities of LLMs are remarkable considering the seemingly straightforward nature of the training methodology.” — from [14]

The current series. This overview is part two of a three part series on the history of open-source LLMs. The first part in the series overviewed initial attempts at creating open-source LLMs. Here, we will study the most popular open-source base models (i.e., language models that have been pre-trained but not fine-tuned or aligned) that are currently available. Next time, we will go over how these models can be fine-tuned or aligned to create a variety of applications.

(from [10, 12, 14, 15])

In part one of this series, we saw that the early days of research on open-source LLMs resulted in the proposal of several important base models, such as OPT and BLOOM. However, these models were widely considered to perform quite poorly compared to closed-source pre-trained models (e.g., GPT-3). How do we solve this? First, we need to take a deeper look at the LLM .

Training pipeline. LLMs are trained in several steps, as shown in the below. First, we pre-train the model…

Source link