News

Arcee takes leading role in model merging innovations

Arcee's recent merger with mergekit has made us a leader in model merging research and development. Check out our video interview with mergekit Founder Charles Goddard, who's come onboard our team as a Senior Research Engineer.

Mary MacCarthy

Mar 1, 2024 • 2 min read

Arcee is proud to be the pioneer in model merging, after recently joining forces with mergekit.

mergekit was created by former NASA and Apple engineer Charles Goddard. He had been experimenting with model merging techniques and found that most of the cutting-edge research papers did not include code releases. He decided to share his work with the open source community as a remarkably easy-to-use and resource-efficient toolkit that he called “mergekit.”

Here at Arcee, it wasn’t long before our researchers came across mergekit, and immediately recognized its value. We reached out to Charles, and the rest is history: Arcee and mergekit merged, with the toolkit now available on Arcee’s Github repository and Charles joining our team as a Senior Research Engineer.

We’re absolutely thrilled to have Charles on board, and we share his vision of model merging as one of the most important innovations happening in LLM research. We also share his deeply-felt commitment to keeping mergekit open source.

Over the next few weeks, we’ll be telling you more about:
• how model merging works
• how you and your organization can use it
• why it fits so well into Arcee’s Small Language Model (SLM) system
• specific uses cases.

Today, you can check out this introductory video from Charles, in which he starts to explain the basics of model merging.

0:00

/0:59

VIDEO TRANSCRIPT
CHARLES GODDARD, mergekit Founder

Model merging is a way to take pre-trained checkpoints of language models, or of any models actually...

mergekit applies it in particular to the field of language models, but the general technique works in computer vision, it works in natural language processing…. it works basically anywhere you have a neural network, more or less.

Model merging takes two or more pre-trained checkpoints and combines them to get a model that has some portion of the strengths and abilities of both…

I like to think of model merging as a way to extend the value of these pre-trained checkpoints...

This is because companies spend huge amounts of money to produce these models, which are useful by themselves just as they're released–but shortly afterwards there's a new, cooler model.

But there's no reason that these checkpoints have to stop being useful at that point. They still represent a huge amount of computation and research efforts and energy and care.”