Falcon-Arabic: A Breakthrough in Arabic Language Models

Hugging Face - Blog

Waypoint-1.5: Higher-Fidelity Interactive Worlds for Everyday GPUs ALTK‑Evolve: On‑the‑Job Learning for AI Agents Safetensors is Joining the PyTorch Foundation Holo3: Breaking the Computer Use Frontier Any Custom Frontend with Gradio's Backend A New Framework for Evaluating Voice Agents (EVA) Bringing Robotics AI to Embedded Platforms: Dataset Recording, VLA Fine‑Tuning, and On‑Device Optimizations One-Shot Any Web App with Gradio's gr.HTML CUGA on Hugging Face: Democratizing Configurable AI Agents New in llama.cpp: Model Management Building Deep Research: How we Achieved State of the Art OVHcloud on Hugging Face Inference Providers 🔥 20x Faster TRL Fine-tuning with RapidFire AI Building for an Open Future - our new partnership with Google Cloud Aligning to What? Rethinking Agent Generalization in MiniMax M2 Building a Healthcare Robot from Simulation to Deployment with NVIDIA Isaac Sentence Transformers is joining Hugging Face! Unlock the power of images with AI Sheets Supercharge your OCR Pipelines with Open Models Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face Get your VLM running in 3 simple steps on Intel CPUs Nemotron-Personas-India: Synthesized Data for Sovereign AI Introducing RTEB: A New Standard for Retrieval Evaluation Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models VibeGame: Exploring Vibe Coding Games Nemotron-Personas-Japan: ソブリン AI のための合成データセット Swift Transformers Reaches 1.0 – and Looks to the Future Smol2Operator: Post-Training GUI Agents for Computer Use SyGra: The One-Stop Framework for Building Data for LLMs and SLMs Gaia2 and ARE: Empowering the community to study agents Scaleway on Hugging Face Inference Providers 🔥 Democratizing AI Safety with RiskRubric.ai Public AI on Hugging Face Inference Providers 🔥 `LeRobotDataset:v3.0`: Bringing large-scale datasets to `lerobot` Visible Watermarking with Gradio Introducing the Palmyra-mini family: Powerful, lightweight, and ready to reason! Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers Fine-tune Any LLM from the Hugging Face Hub with Together AI Jupyter Agents: training LLMs to reason with notebooks mmBERT: ModernBERT goes Multilingual Welcome EmbeddingGemma, Google's new efficient embedding model SAIR: Accelerating Pharma R&D with AI-Powered Structural Intelligence Make your ZeroGPU Spaces go brrr with ahead-of-time compilation NVIDIA Releases 6 Million Multi-Lingual Reasoning Dataset Generate Images with Claude and Hugging Face From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels MCP for Research: How to Connect AI to Research Tools Kimina-Prover-RL Arm & ExecuTorch 0.7: Bringing Generative AI to the masses Neural Super Sampling is here! TextQuests: How Good are LLMs at Text-Based Video Games? 🇵🇭 FilBench - Can LLMs Understand and Generate Filipino? Introducing AI Sheets: a tool to work with datasets using open AI models! Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training Vision Language Model Alignment in TRL ⚡️ Welcome GPT OSS, the new open-source model family from OpenAI! Measuring Open-Source Llama Nemotron Models on DeepResearch Bench 📚 3LM: A Benchmark for Arabic LLMs in STEM and Code Implementing MCP Servers in Python: An AI Shopping Assistant with Gradio Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face Say hello to `hf`: a faster, friendlier Hugging Face CLI ✨ Parquet Content-Defined Chunking TimeScope: How Long Can Your Video Large Multimodal Model Go? Fast LoRA inference for Flux with Diffusers and PEFT Accelerate a World of LLMs on Hugging Face with NVIDIA NIM Arc Virtual Cell Challenge: A Primer Consilium: When Multiple LLMs Collaborate Back to The Future: Evaluating AI Agents on Predicting Future Events Five Big Improvements to Gradio MCP Servers Ettin Suite: SoTA Paired Encoders and Decoders Migrating the Hub from Git LFS to Xet Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models Asynchronous Robot Inference: Decoupling Action Prediction and Execution ScreenEnv: Deploy your full stack Desktop Agent Building the Hugging Face MCP Server Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders Creating custom kernels for the AMD MI300 Upskill your LLMs With Gradio MCP Servers SmolLM3: smol, multilingual, long-context reasoner Three Mighty Alerts Supporting Hugging Face’s Production Infrastructure Efficient MultiModal Data Pipeline Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models Training and Finetuning Sparse Embedding Models with Sentence Transformers Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub Gemma 3n fully available in the open-source ecosystem! Transformers backend integration in SGLang (LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware Groq on Hugging Face Inference Providers 🔥 How Long Prompts Block Other Requests - Optimizing LLM Performance Learn the Hugging Face Kernel Hub in 5 Minutes Convert Transformers to ONNX with Hugging Face Optimum Intel and Hugging Face Partner to Democratize Machine Learning Hardware Acceleration Director of Machine Learning Insights [Part 3: Finance Edition] The Annotated Diffusion Model Deep Q-Learning with Space Invaders Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers Introducing Pull Requests and Discussions 🥳 Efficient Table Pre-training without Real Data: An Introduction to TAPEX An Introduction to Q-Learning Part 2/2 How Sempre Health is leveraging the Expert Acceleration Program to accelerate their ML roadmap

Basma Boussaha, Mohammed Alyafeai, Ahmed Alzubaidi, Leen AlQadi, · 2025-05-21 · via Hugging Face - Blog

Back to Articles

Check out our official blogpost (EN, AR)

We are excited to introduce Falcon-Arabic, a 7B parameter Language Model that sets a new benchmark for Arabic NLP. Built on the Falcon 3 architecture, Falcon-Arabic is a multilingual model that supports Arabic, English, and several other languages. It excels in general knowledge, Arabic grammar, mathematical reasoning, complex problem solving, and understanding the rich diversity of Arabic dialects. Falcon-Arabic supports a context length of 32,000 tokens, allowing it to handle long documents and enabling advanced applications like retrieval-augmented generation (RAG), in-depth content creation, and knowledge-intensive tasks.

Falcon-Arabic redefines the boundaries of what is possible for Arabic Language Models. It significantly outperforms other Arabic LLMs in its size category and even models up to four times larger across both Arabic-native models and those adapted from other languages. This makes Falcon-Arabic not only a state-of-the-art model in terms of performance, but also a uniquely efficient and accessible solution for developers and researchers working with the Arabic language.

🚀 Introducing Falcon-Arabic: Advancing LLMs for the Arabic-Speaking World

In recent years, Large Language Models (LLMs) have transformed Artificial Intelligence, powering tools for translation, content creation, virtual assistance, and more. Yet much of this progress has focused on highly represented languages like English, leaving languages such as Arabic underrepresented. Arabic presents unique challenges it's morphologically rich, diglossic (spanning both Modern Standard Arabic (MSA) and diverse regional dialects), and used across a vast and culturally varied population. Developing robust Arabic LLMs is essential to ensure Arabic-speaking communities are fully included in the AI revolution.

With this goal in mind, we’re introducing Falcon-Arabic a specialized adaptation of the Falcon 3 model family, developed by the Technology Innovation Institute (TII) in the UAE. The Falcon models have earned global recognition for their multilingual strength and open-source approach. Falcon-Arabic builds on this legacy, bringing advanced language understanding and generation to Arabic. By training the model to handle both Modern Standard Arabic and key dialects, Falcon-Arabic fills a critical gap in language technology enabling more natural, intelligent, and inclusive Arabic AI across the Gulf, Middle East, and North Africa.

🦅 Falcon-Arabic Has Landed - Here’s the Training Recipe 🧪

Building Falcon-Arabic started with a strategic decision: rather than training a model from scratch, we chose to adapt a strong multilingual foundation. In the Arabic LLM landscape, three main approaches exist: training from scratch (e.g., Jais-native), adapting multilingual models (like Allam or Fanar), or using models that natively support Arabic alongside other languages (such as Qwen or LLaMA). Observing the Open Arabic LLM Leaderboard, it became clear that adapted and multilingual models consistently outperformed others in both efficiency and capability. To build on that momentum, we selected Falcon 3-7B, a model that strikes a practical balance between performance and resource efficiency within the Falcon 3 family developed by the Technology Innovation Institute (TII).

The core challenge was adapting Falcon 3-7B, which originally lacked Arabic support at the tokenizer and embedding level. We addressed this by extending the tokenizer’s vocabulary with 32,000 Arabic-specific tokens, and applying a novel embedding initialization strategy based on textual similarity. This technique mapped new Arabic tokens to semantically related embeddings from the existing vocabulary, allowing the model to inherit prior knowledge and accelerate learning particularly around sentiment, abstract concepts, and reasoning patterns. This gave Falcon-Arabic a head start in understanding and generating high-quality Arabic text.

With the tokenizer and embeddings in place, we began continuous pretraining on high-quality, 100% native Arabic datasets, avoiding the use of machine-translated content to minimize cultural bias and preserve linguistic authenticity. Training followed a multi-stage curriculum: early stages focused on general knowledge and dialect-rich Arabic content to stabilize the model and reinforce logical capabilities, while later phases emphasized math, code, and reasoning. The result is a model that not only speaks Arabic fluently across dialects, but also retains Falcon’s multilingual and reasoning strengths pushing the boundaries for Arabic-first AI.

Average Performance of Pretrained Models

📊 Falcon-Arabic: Raising the Bar in Arabic LLMs

We evaluated Falcon-Arabic on OALL v2, the leading benchmark for Arabic Language Models. It includes six multiple-choice tasks such as Arabic MMLU (native and translated), Arabic Exams, Alghafa, MadinahQA, Aratrust and one generative benchmark, Alrage. Falcon-Arabic outperforms all existing Arabic LLMs in its size range and even surpasses models up to 4× larger. It leads in key benchmarks like Arabic MMLU, Exams, MadinahQA, and Aratrust, setting a new standard for Arabic-first Language Models.

Comparison Table of Pretrained Models

The evaluation details (log probabilities, predictions and LLM as judge metrics) of Falcon-Arabic-7B-Base are available on https://huggingface.co/datasets/tiiuae/Falcon-Arabic-7B-Base-details

🗣️ From Pretraining to Instruct: Aligning Falcon-Arabic for Conversations

After finalizing the base model training, we performed a post-training alignment phase to fine-tune Falcon-Arabic’s responses according to human preferences. This phase began with supervised fine-tuning (SFT) using a combination of high-quality public datasets and internally collected native Arabic instruction data, covering a range of tasks and conversational scenarios.

To further enhance alignment, we applied Direct Preference Optimization (DPO) a reinforcement learning-based method that tunes the model to prefer outputs that humans rate as more helpful, safe, and relevant. This two-step process ensures that Falcon-Arabic Instruct not only understands Arabic well but responds in a way that aligns with real user expectations.

Average Performance of Instruct Models

As shown in the results plots, Falcon-Arabic Instruct leads the pack, outperforming all other Instruct-aligned Arabic LLMs in its size class and even models significantly larger across multiple benchmarks. The model demonstrates strong performance in both instruction following and open-ended dialogue, setting a new standard for Arabic conversational AI.

Performance of Instruct Models by Benchmark

Comparison Table of Chat Models

The evaluation details (log probabilities, predictions and LLM as judge metrics) of Falcon-Arabic-7B-Instruct are available on https://huggingface.co/datasets/tiiuae/Falcon-Arabic-7B-Instruct-details

🔓 Unlocking the Potential of Arabic AI

Falcon-Arabic sets a new benchmark for Arabic Language Models. With only 7B parameters, it delivers state-of-the-art performance outperforming models of similar size and even those several times larger across key benchmarks like Arabic MMLU, MadinahQA, and Aratrust. It combines fluency in Modern Standard Arabic, strong understanding of regional dialects, and robust reasoning and multilingual capabilities, making it ideal for a wide range of applications: from Arabic-first chatbots and educational tools to content generation, code assistance, and document understanding.

To give you a hands-on feel for what Falcon-Arabic can do, we built a simple demo that showcases its capabilities in machine translation even though the model hasn’t been fine-tuned specifically for that task. The tool runs purely on Falcon-7B-Arabic-Instruct, and the results are surprisingly strong across various translation directions. You can try it yourself through the demo linked just below. In fact, we used the same setup to translate this blog post into Arabic for our Arabic-speaking audience. Check it out here 🚀. And if you're curious to explore more, we also provide access to a live playground where you can interact with Falcon-Arabic Instruct and experience its performance across different tasks ✨.

⚠️ Limitations

Like all Large Language Models, Falcon-Arabic inherits some common limitations. These include occasional hallucinations (producing plausible but incorrect outputs), sensitivity to how prompts are phrased, and varying performance across very long contexts. While Falcon-Arabic is designed to reduce these issues especially for Arabic tasks users should still apply critical thinking when interpreting results, particularly in high-stakes or fact-sensitive use cases.

Citation

If you find this work helpful for your research or projects, please consider citing it.

@misc{falcon-arabic,
    title = {Falcon-Arabic: A Breakthrough in Arabic Language Models},
    author = {Falcon-LLM Team},
    month = {May},
    url = {https://falcon-lm.github.io/blog/falcon-arabic},
    year = {2025}
}

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

Hugging Face - Blog

🚀 Introducing Falcon-Arabic: Advancing LLMs for the Arabic-Speaking World

🦅 Falcon-Arabic Has Landed - Here’s the Training Recipe 🧪

Average Performance of Pretrained Models

📊 Falcon-Arabic: Raising the Bar in Arabic LLMs

Comparison Table of Pretrained Models

🗣️ From Pretraining to Instruct: Aligning Falcon-Arabic for Conversations

Average Performance of Instruct Models

Performance of Instruct Models by Benchmark

Comparison Table of Chat Models

🔓 Unlocking the Potential of Arabic AI

⚠️ Limitations

Citation