The Arcee AI research team is honored to be among the contributors to the world's first fully decentralized training of a large language model (LLM). Read about the game-changing project led by Prime Intellect, and how we brought our expertise to the post-training.
We’re proud to share our collaboration with Prime Intellect on the revolutionary INTELLECT-1 Project, which has achieved the first-ever fully decentralized training of an LLM. This milestone marks a dramatic shift in how AI is built and scaled, unlocking possibilities for greater collaboration and efficiency in LLM training.
What is INTELLECT-1?
Imagine training a cutting-edge AI model, not confined to a single high-tech data center, but distributed across computers worldwide. That’s the essence of INTELLECT-1. Using their innovative PRIME framework, Prime Intellect successfully coordinated the training of a 10-billion-parameter language model on 1 trillion tokens.
The project relied on 30 independent compute providers, spread across continents, to contribute their resources. Prime Intellect’s system managed challenges like varying internet speeds and occasional node dropouts, achieving a 96% compute efficiency—proving decentralized training is not just feasible but powerful.
How Does INTELLECT-1 Compare?
The results speak for themselves. INTELLECT-1 demonstrates competitive performance compared to many models trained using traditional centralized methods. However, it’s important to note that INTELLECT-1 is about a year behind the state-of-the-art models for its size, such as LLaMA-2-7B and LLaMA-2-13B, which have benefited from larger compute budgets, more data, and advanced post-training processes.
Metric | INTELLECT-1 (10B) | Falcon-7B | Pythia-12B | LLaMA-2-7B |
---|---|---|---|---|
MMLU | 37.5 | 26.2 | 26.5 | 45.3 |
HellaSwag | 72.26 | 78.23 | 68.83 | 78.64 |
ARC-C | 52.13 | 47.61 | 40.61 | 54.10 |
TruthfulQA | 35.47 | 34.28 | 31.83 | 38.75 |
While INTELLECT-1’s numbers are impressive for the world's first fully decentralized training project, they also highlight the gap between decentralized approaches and current state-of-the-art methods. Bridging this gap is our next big goal.
Where Arcee-AI Stepped In
At Arcee AI, we contributed 8,200 hours of cutting-edge H100 GPU compute time to INTELLECT-1. But our contributions didn’t end with training. We brought our expertise to post-training, fine-tuning the model to maximize its real-world performance. This phase is where INTELLECT-1 gained its edge over the competition.
Here’s what we did to enhance INTELLECT-1:
- Supervised Fine-Tuning (SFT): Teaching the model to handle specific instructions and complex tasks with greater accuracy.
- Direct Preference Optimization (DPO): Aligning outputs with human preferences for more helpful, intuitive responses.
- Model Merging: Combining the best aspects of multiple training runs into one unified model.
- Logit Distillation: Transferring knowledge from larger models to INTELLECT-1 without increasing its size, boosting its abilities significantly.
Post-Training Results: INTELLECT-1-INSTRUCT
Following post-training, INTELLECT-1-Instruct, the instruction-tuned version of INTELLECT-1, showed marked improvements. As seen in the table below, it competes with larger and more recent models like LLaMA-2-13B on specific benchmarks, demonstrating the power of careful post-training processes.
Metric | INTELLECT-1-Instruct (10B) | LLaMA-2-7B-Chat | LLaMA-2-13B-Chat | MPT-7B-Chat |
---|---|---|---|---|
MMLU | 49.89 | 47.20 | 53.51 | 36.29 |
GSM8K | 38.58 | 23.96 | 37.15 | 8.26 |
TruthfulQA | 42.16 | 45.58 | 44.12 | 35.22 |
BBH | 34.85 | 35.50 | 39.05 | 32.30 |
While INTELLECT-1-Instruct has made remarkable progress, we acknowledge that these results reflect a system still catching up to models built in centralized environments. Our work is far from done—but we’re excited about what lies ahead.
We've known that merging, distilling, and targeted training had healing effects—but we were blown away by the performance improvement compared to the base model. Current optimization techniques are inherently chaotic—especially the DiLoCo SGD algorithm. INTELLECT-1 shows how decentralized innovation and collaborative training can deliver unprecedented results, bridging the gap between experimentation and real-world applications.
All of our datasets are public and thoroughly checked for contamination. This reinforces that model merging, distilling, and future post-training improvements are key to unlocking the full potential of globally distributed trained models.
As part of the INTELLECT-1 release, we’re excited to share that we’re open-sourcing two of our proprietary datasets under the Apache-2.0 license. In the spirit of open science, we’re releasing logits extracted from Llama-3.1-405B during the development of SuperNova, as well as the complete EvolKit-75k dataset, a high-quality instruction tuning dataset generated using our in-house tools. Additionally, everything from the INTELLECT-1 project—data, checkpoints, and the code behind the PRIME framework—is being made open-source. This marks a significant step toward making cutting-edge AI research more collaborative, accessible, and transparent. You can find them on our huggingface page.
Big Kudos to Prime Intellect
Prime Intellect’s technical ingenuity and vision made INTELLECT-1 a reality. They’ve proven that top-tier AI training no longer needs to depend on massive, centralized facilities. This breakthrough opens up new possibilities for collaborative, decentralized AI development that is more accessible to the global research community.
Decentralized Training: The Future is Now
INTELLECT-1 is a glimpse into what decentralized training can achieve. While we acknowledge that it’s still catching up to state-of-the-art centralized models, the gap is closing. With the right innovations and partnerships, we’re excited to accelerate this progress, transforming AI development into a more inclusive, efficient, and scalable process.
The future isn’t just centralized—it’s decentralized, collaborative, and full of potential.
About Arcee AI
Arcee AI, established in 2023 and based in San Francisco, California, has pioneered industry-leading small language models (SLMs) for enterprise applications. To learn more, book a demo with our team.