My observations from experimenting with model merge, evaluation, and fine-tuning
Let’s continue our learning journey of Maxime Labonne’s llm-course, which is pure gold for the community. This time, we will focus on model merge and evaluation.
Maxime has a great article titled Merge Large Language Models with mergekit. I highly recommend you check it out first. We will not repeat the steps he has already laid out in his article, but we will explore some details I came across that might be helpful to you.
We are going to experiment with model merge and model evaluation in the following steps:
- Using LazyMergekit, we merge two models from the Hugging Face hub,
mistralai/Mistral-7B-Instruct-v0.2
andjan-hq/trinity-v1
. - Run AutoEval on the base model
mistralai/Mistral-7B-Instruct-v0.2
. - Run AutoEval on the merged model
MistralTrinity-7b-slerp
. - Fine-tune the merged model with a customized instruction dataset.
- Run AutoEval on the fine-tuned model.
Let’s dive in.
First, how do we select which models to merge?
Determining whether two or multiple models can be merged involves evaluating several key attributes and considerations:
- Model Architecture: Model architecture is a crucial consideration when merging models. Ensure the models share a compatible architecture (e.g., both transformer-based). Merging dissimilar architectures is often challenging. The Hugging Face model card usually details a model’s architecture. If you cannot find the model architecture info, you can try and error with Maxime’s LazyMergekit, which we will explore later. If you encounter an error, it’s usually because of the incompatibility of the model architectures.
- Dependencies and Libraries: Ensure that…