Arcee-AI Releases Two Open Datasets

Arcee-AI Releases Two Open Datasets
Logo for the Tome Dataset

Today, we have made two important datasets publicly available:

  1. Agent Data: This dataset was instrumental in training Arcee-Agent. It contains Salesforce-xlam, agent-flan, and a custom version of Glaive-FC2 with 20k extended samples that call for the model to do tool use sequentially within the same response, along with Magpie-Pro for the sake of maintaining general capabilities and to prevent catastrophic forgetting.
  2. Tome Dataset: Used for training both Arcee-Spark and Arcee-Nova, this dataset comprises 1.75 million samples of highly filtered data for use in training generalist AI assistance.

These releases align with our commitment to transparency and collaborative advancement in AI research. By making these datasets accessible, we aim to facilitate further developments.

Researchers and developers interested in exploring these datasets can access them here.

We encourage the community to utilize these resources responsibly and look forward to seeing the innovative applications and insights that may arise from this data.