Introducing swift-huggingface: The Complete Swift Client for Hugging Face

Hugging Face - Blog

Waypoint-1.5: Higher-Fidelity Interactive Worlds for Everyday GPUs ALTK‑Evolve: On‑the‑Job Learning for AI Agents Safetensors is Joining the PyTorch Foundation Holo3: Breaking the Computer Use Frontier Any Custom Frontend with Gradio's Backend A New Framework for Evaluating Voice Agents (EVA) Bringing Robotics AI to Embedded Platforms: Dataset Recording, VLA Fine‑Tuning, and On‑Device Optimizations One-Shot Any Web App with Gradio's gr.HTML CUGA on Hugging Face: Democratizing Configurable AI Agents New in llama.cpp: Model Management Building Deep Research: How we Achieved State of the Art OVHcloud on Hugging Face Inference Providers 🔥 20x Faster TRL Fine-tuning with RapidFire AI Building for an Open Future - our new partnership with Google Cloud Aligning to What? Rethinking Agent Generalization in MiniMax M2 Building a Healthcare Robot from Simulation to Deployment with NVIDIA Isaac Sentence Transformers is joining Hugging Face! Unlock the power of images with AI Sheets Supercharge your OCR Pipelines with Open Models Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face Get your VLM running in 3 simple steps on Intel CPUs Nemotron-Personas-India: Synthesized Data for Sovereign AI Introducing RTEB: A New Standard for Retrieval Evaluation Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models VibeGame: Exploring Vibe Coding Games Nemotron-Personas-Japan: ソブリン AI のための合成データセット Swift Transformers Reaches 1.0 – and Looks to the Future Smol2Operator: Post-Training GUI Agents for Computer Use SyGra: The One-Stop Framework for Building Data for LLMs and SLMs Gaia2 and ARE: Empowering the community to study agents Scaleway on Hugging Face Inference Providers 🔥 Democratizing AI Safety with RiskRubric.ai Public AI on Hugging Face Inference Providers 🔥 `LeRobotDataset:v3.0`: Bringing large-scale datasets to `lerobot` Visible Watermarking with Gradio Introducing the Palmyra-mini family: Powerful, lightweight, and ready to reason! Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers Fine-tune Any LLM from the Hugging Face Hub with Together AI Jupyter Agents: training LLMs to reason with notebooks mmBERT: ModernBERT goes Multilingual Welcome EmbeddingGemma, Google's new efficient embedding model SAIR: Accelerating Pharma R&D with AI-Powered Structural Intelligence Make your ZeroGPU Spaces go brrr with ahead-of-time compilation NVIDIA Releases 6 Million Multi-Lingual Reasoning Dataset Generate Images with Claude and Hugging Face From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels MCP for Research: How to Connect AI to Research Tools Kimina-Prover-RL Arm & ExecuTorch 0.7: Bringing Generative AI to the masses Neural Super Sampling is here! TextQuests: How Good are LLMs at Text-Based Video Games? 🇵🇭 FilBench - Can LLMs Understand and Generate Filipino? Introducing AI Sheets: a tool to work with datasets using open AI models! Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training Vision Language Model Alignment in TRL ⚡️ Welcome GPT OSS, the new open-source model family from OpenAI! Measuring Open-Source Llama Nemotron Models on DeepResearch Bench 📚 3LM: A Benchmark for Arabic LLMs in STEM and Code Implementing MCP Servers in Python: An AI Shopping Assistant with Gradio Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face Say hello to `hf`: a faster, friendlier Hugging Face CLI ✨ Parquet Content-Defined Chunking TimeScope: How Long Can Your Video Large Multimodal Model Go? Fast LoRA inference for Flux with Diffusers and PEFT Accelerate a World of LLMs on Hugging Face with NVIDIA NIM Arc Virtual Cell Challenge: A Primer Consilium: When Multiple LLMs Collaborate Back to The Future: Evaluating AI Agents on Predicting Future Events Five Big Improvements to Gradio MCP Servers Ettin Suite: SoTA Paired Encoders and Decoders Migrating the Hub from Git LFS to Xet Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models Asynchronous Robot Inference: Decoupling Action Prediction and Execution ScreenEnv: Deploy your full stack Desktop Agent Building the Hugging Face MCP Server Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders Creating custom kernels for the AMD MI300 Upskill your LLMs With Gradio MCP Servers SmolLM3: smol, multilingual, long-context reasoner Three Mighty Alerts Supporting Hugging Face’s Production Infrastructure Efficient MultiModal Data Pipeline Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models Training and Finetuning Sparse Embedding Models with Sentence Transformers Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub Gemma 3n fully available in the open-source ecosystem! Transformers backend integration in SGLang (LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware Groq on Hugging Face Inference Providers 🔥 How Long Prompts Block Other Requests - Optimizing LLM Performance Learn the Hugging Face Kernel Hub in 5 Minutes Convert Transformers to ONNX with Hugging Face Optimum Intel and Hugging Face Partner to Democratize Machine Learning Hardware Acceleration Director of Machine Learning Insights [Part 3: Finance Edition] The Annotated Diffusion Model Deep Q-Learning with Space Invaders Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers Introducing Pull Requests and Discussions 🥳 Efficient Table Pre-training without Real Data: An Introduction to TAPEX An Introduction to Q-Learning Part 2/2 How Sempre Health is leveraging the Expert Acceleration Program to accelerate their ML roadmap

Mattt · 2025-12-05 · via Hugging Face - Blog

Back to Articles

Today, we're announcing swift-huggingface, a new Swift package that provides a complete client for the Hugging Face Hub.

You can start using it today as a standalone package, and it will soon integrate into swift-transformers as a replacement for its current HubApi implementation.

The Problem

When we released swift-transformers 1.0 earlier this year, we heard loud and clear from the community:

Downloads were slow and unreliable. Large model files (often several gigabytes) would fail partway through with no way to resume. Developers resorted to manually downloading models and bundling them with their apps — defeating the purpose of dynamic model loading.
No shared cache with the Python ecosystem. The Python transformers library stores models in ~/.cache/huggingface/hub. Swift apps downloaded to a different location with a different structure. If you'd already downloaded a model using the Python CLI, you'd download it again for your Swift app.
Authentication is confusing. Where should tokens come from? Environment variables? Files? Keychain? The answer is, "It depends", and the existing implementation didn't make the options clear.

Introducing swift-huggingface

swift-huggingface is a ground-up rewrite focused on reliability and developer experience. It provides:

Complete Hub API coverage — models, datasets, spaces, collections, discussions, and more
Robust file operations — progress tracking, resume support, and proper error handling
Python-compatible cache — share downloaded models between Swift and Python clients
Flexible authentication — a TokenProvider pattern that makes credential sources explicit
OAuth support — first-class support for user-facing apps that need to authenticate users
Xet storage backend support (Coming soon!) — chunk-based deduplication for significantly faster downloads

Let's look at some examples.

Flexible Authentication with TokenProvider

One of the biggest improvements is how authentication works. The TokenProvider pattern makes it explicit where credentials come from:

import HuggingFace

// For development: auto-detect from environment and standard locations
// Checks HF_TOKEN, HUGGING_FACE_HUB_TOKEN, ~/.cache/huggingface/token, etc.
let client = HubClient.default

// For CI/CD: explicit token
let client = HubClient(tokenProvider: .static("hf_xxx"))

// For production apps: read from Keychain
let client = HubClient(tokenProvider: .keychain(service: "com.myapp", account: "hf_token"))

The auto-detection follows the same conventions as the Python huggingface_hub library:

HF_TOKEN environment variable
HUGGING_FACE_HUB_TOKEN environment variable
HF_TOKEN_PATH environment variable (path to token file)
$HF_HOME/token file
~/.cache/huggingface/token (standard HF CLI location)
~/.huggingface/token (fallback location)

This means if you've already logged in with hf auth login, swift-huggingface will automatically find and use that token.

OAuth for User-Facing Apps

Building an app where users sign in with their Hugging Face account? swift-huggingface includes a complete OAuth 2.0 implementation:

import HuggingFace

// Create authentication manager
let authManager = try HuggingFaceAuthenticationManager(
    clientID: "your_client_id",
    redirectURL: URL(string: "yourapp://oauth/callback")!,
    scope: [.openid, .profile, .email],
    keychainService: "com.yourapp.huggingface",
    keychainAccount: "user_token"
)

// Sign in user (presents system browser)
try await authManager.signIn()

// Use with Hub client
let client = HubClient(tokenProvider: .oauth(manager: authManager))

// Tokens are automatically refreshed when needed
let userInfo = try await client.whoami()
print("Signed in as: \(userInfo.name)")

The OAuth manager handles token storage in Keychain, automatic refresh, and secure sign-out. No more manual token management.

Reliable Downloads

Downloading large models is now straightforward with proper progress tracking and resume support:

// Download with progress tracking
let progress = Progress(totalUnitCount: 0)

Task {
    for await _ in progress.publisher(for: \.fractionCompleted).values {
        print("Download: \(Int(progress.fractionCompleted * 100))%")
    }
}

let fileURL = try await client.downloadFile(
    at: "model.safetensors",
    from: "microsoft/phi-2",
    to: destinationURL,
    progress: progress
)

If a download is interrupted, you can resume it:

// Resume from where you left off
let fileURL = try await client.resumeDownloadFile(
    resumeData: savedResumeData,
    to: destinationURL,
    progress: progress
)

For downloading entire model repositories, downloadSnapshot handles everything:

let modelDir = try await client.downloadSnapshot(
    of: "mlx-community/Llama-3.2-1B-Instruct-4bit",
    to: cacheDirectory,
    matching: ["*.safetensors", "*.json"],  // Only download what you need
    progressHandler: { progress in
        print("Downloaded \(progress.completedUnitCount) of \(progress.totalUnitCount) files")
    }
)

The snapshot function tracks metadata for each file, so subsequent calls only download files that have changed.

Shared Cache with Python

Remember the second problem we mentioned? "No shared cache with the Python ecosystem." That's now solved.

swift-huggingface implements a Python-compatible cache structure that allows seamless sharing between Swift and Python clients:

~/.cache/huggingface/hub/
├── models--deepseek-ai--DeepSeek-V3.2/
│   ├── blobs/
│   │   └── <etag>           # actual file content
│   ├── refs/
│   │   └── main             # contains commit hash
│   └── snapshots/
│       └── <commit_hash>/
│           └── config.json  # symlink → ../../blobs/<etag>

This means:

Download once, use everywhere. If you've already downloaded a model with the hf CLI or the Python library, swift-huggingface will find it automatically.
Content-addressed storage. Files are stored by their ETag in the blobs/ directory. If two revisions share the same file, it's only stored once.
Symlinks for efficiency. Snapshot directories contain symlinks to blobs, minimizing disk usage while maintaining a clean file structure.

The cache location follows the same environment variable conventions as Python:

HF_HUB_CACHE environment variable
HF_HOME environment variable + /hub
~/.cache/huggingface/hub (default)

You can also use the cache directly:

let cache = HubCache.default

// Check if a file is already cached
if let cachedPath = cache.cachedFilePath(
    repo: "deepseek-ai/DeepSeek-V3.2",
    kind: .model,
    revision: "main",
    filename: "config.json"
) {
    let data = try Data(contentsOf: cachedPath)
    // Use cached file without any network request
}

To prevent race conditions when multiple processes access the same cache, swift-huggingface uses file locking (flock(2)).

Before and After

Here's what downloading a model snapshot looked like with the old HubApi:

// Before: HubApi in swift-transformers
let hub = HubApi()
let repo = Hub.Repo(id: "mlx-community/Llama-3.2-1B-Instruct-4bit")

// No progress tracking, no resume, errors swallowed
let modelDir = try await hub.snapshot(
    from: repo,
    matching: ["*.safetensors", "*.json"]
) { progress in
    // Progress object exists but wasn't always accurate
    print(progress.fractionCompleted)
}

And here's the same operation with swift-huggingface:

// After: swift-huggingface
let client = HubClient.default

let modelDir = try await client.downloadSnapshot(
    of: "mlx-community/Llama-3.2-1B-Instruct-4bit",
    to: cacheDirectory,
    matching: ["*.safetensors", "*.json"],
    progressHandler: { progress in
        // Accurate progress per file
        print("\(progress.completedUnitCount)/\(progress.totalUnitCount) files")
    }
)

The API is similar, but the implementation is completely different — built on URLSession download tasks with proper delegate handling, resume data support, and metadata tracking.

Beyond Downloads

But wait, there's more! swift-huggingface contains a complete Hub client:

// List trending models
let models = try await client.listModels(
    filter: "library:mlx",
    sort: "trending",
    limit: 10
)

// Get model details
let model = try await client.getModel("mlx-community/Llama-3.2-1B-Instruct-4bit")
print("Downloads: \(model.downloads ?? 0)")
print("Likes: \(model.likes ?? 0)")

// Work with collections
let collections = try await client.listCollections(owner: "huggingface", sort: "trending")

// Manage discussions
let discussions = try await client.listDiscussions(kind: .model, "username/my-model")

And that's not all! swift-huggingface has everything you need to interact with Hugging Face Inference Providers, giving your app instant access to hundreds of machine learning models, powered by world-class inference providers:

import HuggingFace

// Create a client (uses auto-detected credentials from environment)
let client = InferenceClient.default

// Generate images from a text prompt
let response = try await client.textToImage(
    model: "black-forest-labs/FLUX.1-schnell",
    prompt: "A serene Japanese garden with cherry blossoms",
    provider: .hfInference,
    width: 1024,
    height: 1024,
    numImages: 1,
    guidanceScale: 7.5,
    numInferenceSteps: 50,
    seed: 42
)

// Save the generated image
try response.image.write(to: URL(fileURLWithPath: "generated.png"))

Check the README for a full list of everything that's supported.

What's Next

We're actively working on two fronts:

Integration with swift-transformers. We have a pull request in progress to replace HubApi with swift-huggingface. This will bring reliable downloads to everyone using swift-transformers, mlx-swift-lm, and the broader ecosystem. If you maintain a Swift-based library or app and want help adopting swift-huggingface, reach out — we're happy to help.

Faster downloads with Xet. We're adding support for the Xet storage backend, which enables chunk-based deduplication and significantly faster downloads for large models. More on this soon.

Try It Out

Add swift-huggingface to your project:

dependencies: [
    .package(url: "https://github.com/huggingface/swift-huggingface.git", from: "0.4.0")
]

We'd love your feedback. If you've been frustrated with model downloads in Swift, give this a try and let us know how it goes. Your experience reports will help us prioritize what to improve next.

Resources

Thanks to the swift-transformers community for the feedback that shaped this project, and to everyone who filed issues and shared their experiences. This is for you. ❤️

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

Hugging Face - Blog

The Problem

Introducing swift-huggingface

Flexible Authentication with TokenProvider

OAuth for User-Facing Apps

Reliable Downloads

Shared Cache with Python

Before and After

Beyond Downloads

What's Next

Try It Out

Resources