Unlock the power of images with AI Sheets

Hugging Face - Blog

Waypoint-1.5: Higher-Fidelity Interactive Worlds for Everyday GPUs ALTK‑Evolve: On‑the‑Job Learning for AI Agents Safetensors is Joining the PyTorch Foundation Holo3: Breaking the Computer Use Frontier Any Custom Frontend with Gradio's Backend A New Framework for Evaluating Voice Agents (EVA) Bringing Robotics AI to Embedded Platforms: Dataset Recording, VLA Fine‑Tuning, and On‑Device Optimizations One-Shot Any Web App with Gradio's gr.HTML CUGA on Hugging Face: Democratizing Configurable AI Agents New in llama.cpp: Model Management Building Deep Research: How we Achieved State of the Art OVHcloud on Hugging Face Inference Providers 🔥 20x Faster TRL Fine-tuning with RapidFire AI Building for an Open Future - our new partnership with Google Cloud Aligning to What? Rethinking Agent Generalization in MiniMax M2 Building a Healthcare Robot from Simulation to Deployment with NVIDIA Isaac Sentence Transformers is joining Hugging Face! Supercharge your OCR Pipelines with Open Models Google Cloud C4 Brings a 70% TCO improvement on GPT OSS with Intel and Hugging Face Get your VLM running in 3 simple steps on Intel CPUs Nemotron-Personas-India: Synthesized Data for Sovereign AI Introducing RTEB: A New Standard for Retrieval Evaluation Accelerating Qwen3-8B Agent on Intel® Core™ Ultra with Depth-Pruned Draft Models VibeGame: Exploring Vibe Coding Games Nemotron-Personas-Japan: ソブリン AI のための合成データセット Swift Transformers Reaches 1.0 – and Looks to the Future Smol2Operator: Post-Training GUI Agents for Computer Use SyGra: The One-Stop Framework for Building Data for LLMs and SLMs Gaia2 and ARE: Empowering the community to study agents Scaleway on Hugging Face Inference Providers 🔥 Democratizing AI Safety with RiskRubric.ai Public AI on Hugging Face Inference Providers 🔥 `LeRobotDataset:v3.0`: Bringing large-scale datasets to `lerobot` Visible Watermarking with Gradio Introducing the Palmyra-mini family: Powerful, lightweight, and ready to reason! Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers Fine-tune Any LLM from the Hugging Face Hub with Together AI Jupyter Agents: training LLMs to reason with notebooks mmBERT: ModernBERT goes Multilingual Welcome EmbeddingGemma, Google's new efficient embedding model SAIR: Accelerating Pharma R&D with AI-Powered Structural Intelligence Make your ZeroGPU Spaces go brrr with ahead-of-time compilation NVIDIA Releases 6 Million Multi-Lingual Reasoning Dataset Generate Images with Claude and Hugging Face From Zero to GPU: A Guide to Building and Scaling Production-Ready CUDA Kernels MCP for Research: How to Connect AI to Research Tools Kimina-Prover-RL Arm & ExecuTorch 0.7: Bringing Generative AI to the masses Neural Super Sampling is here! TextQuests: How Good are LLMs at Text-Based Video Games? 🇵🇭 FilBench - Can LLMs Understand and Generate Filipino? Introducing AI Sheets: a tool to work with datasets using open AI models! Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training Vision Language Model Alignment in TRL ⚡️ Welcome GPT OSS, the new open-source model family from OpenAI! Measuring Open-Source Llama Nemotron Models on DeepResearch Bench 📚 3LM: A Benchmark for Arabic LLMs in STEM and Code Implementing MCP Servers in Python: An AI Shopping Assistant with Gradio Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face Say hello to `hf`: a faster, friendlier Hugging Face CLI ✨ Parquet Content-Defined Chunking TimeScope: How Long Can Your Video Large Multimodal Model Go? Fast LoRA inference for Flux with Diffusers and PEFT Accelerate a World of LLMs on Hugging Face with NVIDIA NIM Arc Virtual Cell Challenge: A Primer Consilium: When Multiple LLMs Collaborate Back to The Future: Evaluating AI Agents on Predicting Future Events Five Big Improvements to Gradio MCP Servers Ettin Suite: SoTA Paired Encoders and Decoders Migrating the Hub from Git LFS to Xet Kimina-Prover: Applying Test-time RL Search on Large Formal Reasoning Models Asynchronous Robot Inference: Decoupling Action Prediction and Execution ScreenEnv: Deploy your full stack Desktop Agent Building the Hugging Face MCP Server Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders Creating custom kernels for the AMD MI300 Upskill your LLMs With Gradio MCP Servers SmolLM3: smol, multilingual, long-context reasoner Three Mighty Alerts Supporting Hugging Face’s Production Infrastructure Efficient MultiModal Data Pipeline Announcing NeurIPS 2025 E2LM Competition: Early Training Evaluation of Language Models Training and Finetuning Sparse Embedding Models with Sentence Transformers Welcome the NVIDIA Llama Nemotron Nano VLM to Hugging Face Hub Gemma 3n fully available in the open-source ecosystem! Transformers backend integration in SGLang (LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware Groq on Hugging Face Inference Providers 🔥 How Long Prompts Block Other Requests - Optimizing LLM Performance Learn the Hugging Face Kernel Hub in 5 Minutes Featherless AI on Hugging Face Inference Providers 🔥 Convert Transformers to ONNX with Hugging Face Optimum Intel and Hugging Face Partner to Democratize Machine Learning Hardware Acceleration Director of Machine Learning Insights [Part 3: Finance Edition] The Annotated Diffusion Model Deep Q-Learning with Space Invaders Graphcore and Hugging Face Launch New Lineup of IPU-Ready Transformers Introducing Pull Requests and Discussions 🥳 Efficient Table Pre-training without Real Data: An Introduction to TAPEX An Introduction to Q-Learning Part 2/2 How Sempre Health is leveraging the Expert Acceleration Program to accelerate their ML roadmap

Ame Vi, Daniel Vila, Francisco Aranda, Damián Pumar, Leandro von · 2025-10-21 · via Hugging Face - Blog

Back to Articles

This article is also available in Chinese 简体中文.

Your images have stories to tell
Generate and transform text and images in the same flow
Step-by-step guide
Upload your data
Understanding AI actions
Extract text from images.
Clean, transform, and enrich text
Edit and transform images.
Export your dataset
What's next?
🧭TL;DR: Hugging Face AI Sheets is an open-source tool for supercharging datasets with AI models, no code required. Now with vision support: extract data from images (receipts, documents), generate visuals from text, and edit images—all in a spreadsheet. Powered by thousands of open models via Inference Providers.

Analyzing your images with AI Sheets

We are excited to release a massive update to Hugging Face AI Sheets, the open-source tool for building, transforming, and enriching data with open AI models. AI Sheets leverages Inference Providers, which means you can use thousands of open models powered by the best inference providers on the planet.

The first version of AI Sheets made structuring and enriching textual content a breeze. Now, we're adding vision to AI Sheets.

Images are everywhere—product photos, receipts, screenshots, diagrams, charts, logos. These documents contain structured information waiting to be extracted, analyzed, and transformed. Today, you can finally work with visual content directly in AI Sheets: view images, analyze them, extract information, generate new ones, and even edit them in real-time —all in the same workflow.

Your images have stories to tell

Images contain valuable information—product catalogs, support tickets, research archives, receipts, documents. Now you can upload images directly or use datasets with images, and use vision models to extract, analyze, and structure the information inside them.

What you can do:

Describe and categorize images - Generate captions for product photos, classify document types, or tag images by content
Extract structured data - Pull line items from receipts, data from charts, or text from scanned documents
Add context and metadata - Automatically label images with relevant attributes, quality scores, or custom annotations

Just like text columns, you can iterate on prompts, manually edit outputs, and use thumbs-up to teach the model what you want. Your feedback becomes few-shot examples for better results.

Example: From receipts to structured expenses

Imagine you're back from a trip with a stack of receipts. Upload them to AI Sheets and create a column with a prompt like: Extract the merchant name, date, total amount, and expense category from this receipt

AI Sheets processes each receipt and gives you a clean table with all the details extracted. You can edit any mistakes, validate good results with thumbs-up, and regenerate to improve the rest. Export the final dataset as CSV or Parquet for your expense tracking tool.

Or maybe you're digitizing handwritten recipes from old family notebooks. Create columns to extract ingredients, cooking time, and cuisine type—turning your personal archive into a searchable, structured dataset.

Generate and transform text and images in the same flow

Need visuals for your content? AI Sheets can generate and edit images directly in your spreadsheet using AI models, keeping your entire content creation workflow in one place.
What you can do:

Generate images from text - Create social media graphics, thumbnails, or illustrations that match your content
Edit and transform existing images - Modify uploaded images or generated visuals—change styles, add elements, adjust compositions
Create variations at scale - Generate multiple versions or styles to test what resonates with your audience
Build visual content libraries - Produce consistent branded assets across large content campaigns

Example: Creating a content calendar with visuals
Imagine you're planning a month of social media posts about healthy recipes. You have a spreadsheet with post titles and descriptions, but no images yet.

Create an image column with a prompt like: Generate an appetizing food photo for: {{title}}. Style: bright, overhead shot, natural lighting.

AI Sheets generates a unique image for each post. Not quite right? Create another column to edit them: Transform the image to have a rustic wooden background and add fresh herbs as garnish.

You can iterate on generation and editing prompts and try different approaches. Your entire content calendar—copy and visuals—lives in one spreadsheet, ready to schedule or export.

Step-by-step guide

Now let’s see AI Sheets in action. We will use open models to unlock the knowledge within handwritten recipes like the ones you could find from your grandma.

Upload your data

We have a folder with photos that we can simply upload to the app.

The result is a spreadsheet like this:

Understanding AI actions

Each column in your spreadsheet can be transformed, extracted from, queried, and anything you can imagine using AI actions.

To see this in action, click on the overlay on top of any column:

Image columns come with image operations like extracting text, asking the image, object detection, colorization, adding text, and any custom action you can think of.

Text columns include summarization, keyword extraction, translation, and custom actions.

A prompt and a model define every AI action. Let’s see what we can do with our handwritten recipes dataset!

Extract text from images.

AI Sheets comes with a template to extract text from images:

The result of this action is an AI-generated column with the transcribed text. Let’s see an example:

For the above image, the extracted text is as follows:

MEMORANDUM:

From

To

1 Box Duncan Hines Yellow Cake Mix

1 Box instant lemon pudding

2/3 cups water

1/2 cup Mozola oil

4 eggs

Lemon flavoring to taste.

Put in mixing bowl and beat for 10 min.

and REMEMBER... for Quality PRINTING

CALL OR WRITE

Gatling & Pierce

PRINTERS

TELEPHONE 332-2579

22 YEARS OF SERVICE IN NORTHEASTERN CAROLINA

Not bad! But we see it has included printed text for the header and footer, and we’re interested in the recipe text. The reason this text is included is that we have used the default template for text extraction, which is as follows:

Extract and transcribe all visible text from the image, including signs, labels, documents, or any written content

Let’s now try a custom prompt.

Here is the extracted recipe details:

- 1 box Duncan Hines Yellow Cake Mix
- 1 box instant lemon pudding
- 2/3 cups water
- 1/2 cup Mazola oil
- 4 eggs
- Lemon flavoring to taste
- Put in mixing bowl and beat for 10 minutes

This is great! But what about more complex images? By default, AI Sheets uses models with a good balance of speed and accuracy, but you can experiment with thousands of models. The above example uses the default vision language model Qwen/Qwen2.5-VL-7B-Instruct.

Let’s test a SoTA reasoning model, Qwen/Qwen3-VL-235B-A22B-Reasoning, with a more challenging image.

Here’s the comparison between the models:

Qwen/Qwen2.5-VL-7B-Instruct	Qwen/Qwen3-VL-235B-A22B-Reasoning
in large bowl combine meat, onion, bread crumbs 1/2 nutmeg & cheese - as you add sprinkle around. Then blend - Last sprinkle blend again Bake in large pan for 10-15 min. at 350. Let stand 5 min before serving.	in lg bowl combine meat, onion, bread crumbs 1/4 nutmeg & cheese - as you add sprinkle around. then blend - last spinach blend again. Bake in lg pan for 50-60 min. @ 350 - let stand 5 min before serving

Both models produce very similar outputs, but with two subtle but important details (in bold): the temperature and a key ingredient: spinach.

Clean, transform, and enrich text

Once we are satisfied with the extracted text, we can further transform and enrich it. We need to perform an AI action with the new column as follows:

We now have a beautifully structured HTML page for each recipe:

Edit and transform images.

Finally, AI Sheets integrates image-to-image models like Qwen-Image-Edit. This means you can run AI actions to transform and enrich your images.

For example, let’s say you want to give your recipes and old-looking style, you need to go to the column and use the B&W template like so:

Result:

Export your dataset

Once you're happy with your new dataset, export it to the Hub! You can export it to an organization, your personal profile or make it private if you don't want to share it with the community.

You can check out the dataset we have just created.

What's next?

You can try AI Sheets without installing or downloading and deploying it locally from the GitHub repo. To run locally and get the most out of it, we recommend you subscribe to PRO and get 20x monthly inference usage.

If you have questions or suggestions, let us know in the Community tab or by opening an issue on GitHub.

此内容由惯性聚合(RSS阅读器)自动聚合整理，仅供阅读参考。原文来自 — 版权归原作者所有。

推荐订阅源

Hugging Face - Blog

Your images have stories to tell

Generate and transform text and images in the same flow

Step-by-step guide

Upload your data

Understanding AI actions

Extract text from images.

Clean, transform, and enrich text

Edit and transform images.

Export your dataset

What's next?