Microsoft Open-Sources Industry-Leading Embedding Model

We’re excited to announce an industry leading open-source embeddings model built to support the agentic web.

As AI systems evolve from answering questions to acting, grounding is the foundational capability that drives user trust for any AI agent. The ability to provide the right level of information, at the right time, at the right context. At the heart of grounding is the embedding model: the layer that does the hard work of searching, retrieving, organizing, and connecting information across diverse sources into a coherent, meaningful response.

It’s for this reason, we’re excited to announce we’ve open-sourced our industry leading embeddings model built to support the agentic web.

Grounding quality is determined long before a model produces its final answer. In production systems, stronger embeddings translate into higher factual accuracy through better first-pass retrieval, lower latency and cost through fewer retries and smaller contexts, and more stable agent behavior across multi-step tasks.

In the agent era, this capability matters even more. Agents must search across diverse sources, maintain memory over time, and update context across multiple steps. In these environments, embeddings are not just a retrieval primitive. They are a foundational layer for memory, ranking, and orchestration.

Enter Harrier.

Harrier is a new embedding model series designed for the demands of modern AI systems. It is our latest open-source text embedding model series, delivering state-of-the-art performance and ranking 1^st on the multilingual MTEB-v2 benchmark (as of April 6, 2026). This result reflects Microsoft’s sustained commitment to improve grounding quality through advances in embedding models.

(Image source retrieved on April 6, 2026 - cropped for clarity)

Better embeddings lead to better retrieval, surfacing the right information more often and ranking it higher, which in turn can improve the final user experience: often with more accurate answers, fewer hallucinations, better citations, and stronger multilingual performance. In other words, Harrier’s benchmark gains translate into more reliable grounding in real-world systems.

Harrier is designed not only to improve embedding quality in isolation, but to strengthen the full grounding pipeline that modern AI systems depend on. It supports more than 100 languages, offers a 32k context window, and produces fixed-size embeddings for each input, enabling seamless integration with vector search systems.

Technical overview
We started by developing a scalable data pipeline that gathers multilingual text pairs from multiple sources and uses GPT-5 to generate a wide range of synthetic data. This process resulted in more than 2 billion weakly-supervised data examples for contrastive pre-training and over 10 million high-quality examples for fine-tuning. To ensure the highest standards, we applied thorough data filtering and rewrote the data using large language models, when necessary. After preparing the dataset, we trained our flagship model and then used it as a teacher for knowledge distillation, enhancing the performance of smaller embedding models.

Key technical ideas
Building upon our prior work in text embeddings, including E5, Multilingual E5, E5-mistral, and GritLM, we incorporated several approaches to advance the state of text embeddings:

Large-scale contrastive pre-training and fine-tuning. By scaling the dataset size throughout both the contrastive pre-training and fine-tuning stages, we observed consistent improvements in performance.
Synthetic data generation. Utilizing frontier models such as GPT-5, we generated multilingual text pairs at scale, employing a variety of synthesis strategies to enhance data diversity.
Knowledge distillation. LLM-based re-rankers produced high-quality training signals and efficiently filter noisy data. Our smaller models benefit from knowledge distillation, receiving guidance from larger teacher models during training.

Evaluation
We evaluate on the multilingual MTEB v2 benchmark. For deployment on low-end devices, we trained two smaller models: Harrier-OSS-v1-0.6b and Harrier-OSS-v1-270m.

Model	Avg Score over 131 tasks	Borda Count Rank
MMTEB leaderboard SoTA	72.3	-
Harrier-OSS-v1-27B	74.3 (+2.0%)	1
multilingual-e5-large-inst	63.2	-
Qwen3-Embedding-0.6B	64.3	-
Harrier-OSS-v1-0.6b	69.0 (+4.7%)	10
Embeddinggemma-270m	61.2	-
Harrier-OSS-v1-270m	66.5 (+5.3%)	15

The table indicates that our model outperforms other open-source embedding models. The "Borda Count Rank" reflects the hypothetical ranking as of March 16 and may change with future submissions.

Compared with leading proprietary models, we are operating at the frontier of embedding quality and efficiency.

Model	MTEB Multilingual, Mean(Task)
OpenAI text-embedding-3-large	58.92
Amazon.titan-embed-text-v2	60.37
Harrier-OSS-v1-270m	66.55
Gemini Embedding 1	68.33
Harrier-OSS-v1-0.6b	69.01
Gemini Embedding 2(Multi-modal)	69.9
Harrier-OSS-v1-27B	74.27

What comes next
The work behind Harrier is not just a model release. It is part of a broader effort to build the next generation of grounding systems for the agent era.

Drawing on the same core advances, we are developing a new grounding service designed to deliver better retrieval quality, stronger semantic understanding, and more robust context selection at scale. These innovations will also be coming to Bing, bringing the benefits of this new embedding foundation into real user experiences.

The future of capable agents will depend not only on reasoning and generation, but on how effectively they are grounded in the world. Harrier is a meaningful step toward making that possible — and we're just getting started.

Authors: Xiaolong Huang, Liang Wang, Furu Wei, Jingwen Lu, Knut Risvik, Jason Li

推荐订阅源

Bing Blogs