惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
V
Vulnerabilities – Threatpost
Attack and Defense Labs
Attack and Defense Labs
N
News and Events Feed by Topic
SecWiki News
SecWiki News
S
Security @ Cisco Blogs
Schneier on Security
Schneier on Security
B
Blog
TaoSecurity Blog
TaoSecurity Blog
The Last Watchdog
The Last Watchdog
H
Hacker News: Front Page
Hacker News - Newest:
Hacker News - Newest: "LLM"
博客园_首页
D
Docker
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
Y
Y Combinator Blog
W
WeLiveSecurity
N
News and Events Feed by Topic
F
Fortinet All Blogs
PCI Perspectives
PCI Perspectives
WordPress大学
WordPress大学
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
www.infosecurity-magazine.com
www.infosecurity-magazine.com
Recent Announcements
Recent Announcements
Forbes - Security
Forbes - Security
T
Tailwind CSS Blog
Hacker News: Ask HN
Hacker News: Ask HN
爱范儿
爱范儿
腾讯CDC
Last Week in AI
Last Week in AI
月光博客
月光博客
C
Cybersecurity and Infrastructure Security Agency CISA
P
Proofpoint News Feed
Help Net Security
Help Net Security
V
V2EX
C
Cyber Attacks, Cyber Crime and Cyber Security
C
CXSECURITY Database RSS Feed - CXSecurity.com
H
Heimdal Security Blog
L
LINUX DO - 最新话题
GbyAI
GbyAI
The Hacker News
The Hacker News
罗磊的独立博客
S
SegmentFault 最新的问题
H
Hackread – Cybersecurity News, Data Breaches, AI and More
博客园 - 【当耐特】
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
V2EX - 技术
V2EX - 技术
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
O
OpenAI News
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻

Hacker News: Show HN

PurrrrrFocus: Pomodoro Timer App - App Store Workflow Engine — Multi-Step Orchestration for Bun RapidPhoto: Pro Photo Editor App - App Store GitHub - DheerG/swarms: Achieve extraordinary results with claude code across a variety of tasks SPICE simulation → oscilloscope → verification with Claude Code — Lucas Gerads Show HN: VCoding – A 5 MB native Windows IDE with no dynamic dependencies Show HN: LLMs don't hallucinate because they're bad at math, it's the format GitHub - Agent-FM/agentfm-core: AgentFM is a peer-to-peer network that turns everyday computers into a decentralized AI supercomputer. AgentFM lets you run massive AI workloads directly across a global mesh of idle CPUs and GPUs. Show HN: Tracking Top US Science Olympiad Alumni over Last 25 Years GitHub - Potarix/agent-hub: One place to talk to all your agents Show HN: Runtime security for AI agents(injection,tool abuse, data exfiltration) GitHub - dubeyKartikay/lazyspotify: Terminal Spotify client for macOS and Linux GitHub - the-banana-tool/king-louie: Easy to use GUI Personal AI Assistant. Win/Linux/Mac. Show HN I made my vacation rental bookable by AI agents–no Airbnb, 0% commission GitHub - basteez/jsf-autoreload: maven plugin to enable hot reload on jsf projects uvm32/hosts/host-gdbstub at main · ringtailsoftware/uvm32 GitHub - labsai/EDDI: Config-driven engine that turns JSON into production-grade AI agents. Multi-agent orchestration, 12+ LLM providers, MCP/A2A protocols, RAG, persistent memory, and enterprise compliance (EU AI Act, GDPR, HIPAA). Built on Quarkus. GitHub - glitchnsec/fortyone-oss: AI Executive Assistant Platform Quickstart | Alien GitHub - muxshed/shed: One stream in, or many. Every destination, simultaneously. No cloud middleman, no per-channel fees, no limits. GitHub - ocrbase-hq/ocrbase: 📄 PDF/IMG ->.MD/JSON Document OCR API for PaddleOCR and GLMOCR. Self-hostable. GitHub - impactjo/home-memory: MCP server that lets your AI assistant remember everything about your home. GitHub - Sets88/dbcls: DbCls is a powerful terminal database client that supports various databases GitHub - neptun2000/heor-agent-mcp GitHub - SeanFDZ/macmind: Single-layer transformer in HyperTalk for the classic Macintosh RollQuation: Math Puzzles - Apps on Google Play GitHub - dropbox/witchcraft Show HN: Agent-cache – Multi-tier LLM/tool/session caching for Valkey and Redis GitHub - opentalon/opentalon: OpenTalon is an open-source platform built from the ground up in Go as a robust alternative to OpenClaw LinkedIn™ 职位抓取工具 - Chrome 应用商店 GitHub - EdoardoBambini/Agent-Armor-Iaga: AI agents are getting tool access — shell, file system, databases, APIs, secrets. But **nobody is governing what they actually do with it**. Frameworks like LangChain, CrewAI, AutoGen, and Claude Code give agents the power to execute. Agent Armor gives you the power to control, audit, and approve every single action before it happens. HN Vibes — Week 15, Apr 7–13 2026 GitHub - chojs23/ec: Easy terminal-native 3-way git mergetool vim-like workflow GitHub - SethPyle376/hiraeth: Local AWS emulator focused on fast integration testing, with SQS support, SQLite-backed state, and a debug-friendly web UI. GitHub - JakOb-dotcom/cloud-sandbox-security-analysis: Technical analysis and Proof of Concept (PoC) regarding environment variable exfiltration in containerized cloud sandboxes via side-channel data leaks. Springboards - Flint Alpha Show HN: A simpler coding agent harness GitHub - audiodude/sudomake-friends GitHub - 256thFission/mini-mythos: OSS clone of Anthropic’s Mythos harness to locate C/C++ memory vulnerabilities Show HN: OpenParallax: OS-level privilege separation for AI agent execution Hacker News Sorted - Chrome 应用商店 Show HN: How to Install Docker on Ubuntu 24.04 LTS: Complete 2026 Guide GitHub - himanshudongre/smriti GitHub - sverrirsig/claude-control: macOS desktop dashboard for monitoring and managing multiple Claude Code sessions GitHub - ory/dockertest: Write better integration tests! Dockertest helps you boot up ephermal docker images for your Go tests with minimal work. Chiral - Chrome 应用商店 Show HN: Two Claudes collaborating through shared memory on a $100 mini-PC GitHub - pmichaillat/latex-cv: Minimalist LaTeX template for academic CVs GitHub - oguzbilgic/posse: A web UI for Anthropic Managed Agents. GitHub - sshiraz/depsly: Dependency risk analysis tool for npm packages ABI Add safari/agent-harness — Safari browser automation via safari-mcp by achiya-automation · Pull Request #212 · HKUDS/CLI-Anything GitHub - Halfblood-Prince/trustcheck: Verify PyPI package attestations and improve Python supply-chain security GitHub - oguzbilgic/kern-ai: Agents that do the work and show it. GitHub - bruits/satteri: High-performance Markdown and MDX processing for the JavaScript ecosystem GitHub - tylergibbs1/feedstock: High-performance web crawler and scraper for TypeScript, powered by Bun and Playwright GitHub - Grimm67123/grimmbot: The self-improving sandboxed and open-source AI agent. With persistent memory and scheduling. GitHub - whitevanillaskies/whitebloom: Local whiteboard that blooms. GitHub - hwdsl2/docker-whisper: Docker image for a self-hosted Whisper speech-to-text server with speaker diarization and OpenAI-compatible transcription and translation APIs. Powered by faster-whisper. Supports all Whisper models, NVIDIA GPU (CUDA) acceleration, JSON/SRT/VTT output, SSE streaming, offline mode, and multi-arch (amd64, arm64). GitHub - yisding/reviewwiggum GitHub - MarwanAlsoltany/serrors: Structured errors for Go: sentinel hierarchies, typed data, custom formatting, and slog integration. GitHub - soatok/age-php GitHub - Luthiraa/markitme GitHub - stagas/rtdiff: realtime git diff gui and AI-assisted commits GitHub - tombedor/excalicharts GitHub - wh1le/excalidraw-edit: Open and edit .excalidraw files from the terminal. Offline, auto-saves to disk. MalExt Sentry - Malicious Extension Scanner - Chrome 应用商店 GitHub - syi0808/asciianimesvg: Generate animated ASCII art SVGs from text. CLI, Rust library, WASM, and web editor. GitHub - zaina-ml/ml_forge: A visual-based graph node editor for training computer vision models. GitHub - anakin87/llm-rl-environments-lil-course: 🌱 A little course on Reinforcement Learning Environments for evaluating and training Language Models GitHub - takaakit/superpowers-uml: Superpowers-UML modifies Superpowers to ensure a software development workflow in which AI agents design through UML modeling. AdriByte Studio - Sviluppo Web e Soluzioni Digitali GitHub - chouligi/angel-copilot: Your personalized Angel Investment Advisor Show HN: MoodSense AI (ML and FastAPI and Gradio, Deployed on Hugging Face) Moodsense Ai - a Hugging Face Space by aman179102 GitHub - agenteractai/lodmem: Level Of Detail Context Management for Agents GitHub - ostefani/subnetlens: A fast, concurrent network scanner with a TUI and plain-text CLI, built in Go. It discovers live hosts on your network, scans their open ports, resolves hostnames, and fingerprints operating systems—delivered. Cyber Pulse: Agentic Intel - Apps on Google Play Whisper API: Self-Hostable Speech to Text Transcription The Agent-Web Protocol Stack: A Research Thesis GitHub - msmarkgu/RelayFreeLLM: A restful API designed to route user prompts to various AI model providers. Show HN: Provepy – A Python decorator that proves your code using Lean and LLMs Show HN: Pardonned.com – A searchable database of US Pardons GitHub - patrickdappollonio/dux: Dux is a terminal UI that lets you run multiple AI coding agents side by side, each in its own git worktree, with full companion terminals, macros, commit generation, and a command palette that knows more tricks than you do. kMC Crystal Simulator Show HN: HyperFlow – A self-improving agent framework built on LangGraph GitHub - stef41/vibescore: 🎵 Grade your vibe-coded project. One command, instant letter grade across security, quality, dependencies, and testing. GitHub - stef41/lmscan: 🔍 Detect AI-generated text and fingerprint which LLM wrote it. Open-source GPTZero alternative. Zero dependencies, works offline. imgur.com GitHub - visionscaper/collabmem: Enabling long-term collaboration with Agentic AI - building up episodic and world model memory over time with in-context awareness 在 Steam 上购买 FriedrichAI: Offline AI 立省 10% GitHub - atripati/ark: AI Runtime Kernel — a context operating system for AI agents. Eliminates tool bloat, loads only what’s needed, and gives LLMs their reasoning space back. GitHub - nowork-studio/toprank: Open-source Claude Code skills for SEO, SEM, Google Ads GitHub - tacomanator/sash: Lightweight macOS menu bar app for reliably cycling through windows of the current application. Appents | Social Media Management for Product-First Teams GitHub - pnhoang/youtube-spam-blocker: Automatically detects and hides spam messages in YouTube Live chat. Set rate limits, keyword filters, and block repeat offenders. GitHub - decisionnode/DecisionNode: CLI + Local MCP - A shared structured memory store across Claude Code, Cursor, Windsurf, Antigravity, and every MCP client. Semantically queryable. GitHub - AvaCodeSolutions/django-email-learning: An open source Django app for creating email-based learning platforms with IMAP integration and React frontend components. The $100K Gap in Kubernetes Security Tooling Function Calling Harness: From 6.75% to 100%
GitHub - hwdsl2/docker-ollama: Docker image to run an Ollama local LLM server. Secure by default, all API requests require a Bearer token (auto-generated on first start). OpenAI-compatible API. Supports first-start model pre-pull, NVIDIA GPU (CUDA) acceleration, and persistent model storage. Multi-arch: amd64, arm64.
hwdsl2 · 2026-04-30 · via Hacker News: Show HN

English | 简体中文 | 繁體中文 | Русский

Build Status  License: MIT

Docker image to run an Ollama local LLM server. Provides an OpenAI-compatible API for running large language models locally. Based on Debian Trixie (slim). Designed to be simple, private, and secure by default.

Features:

  • Secure by default — all API requests require a Bearer token (auto-generated on first start)
  • Auto-generates an API key on first start, stored in the persistent volume
  • First-start model pre-pull via OLLAMA_MODELS environment variable
  • Model management via a helper script (ollama_manage)
  • OpenAI-compatible API — point any OpenAI SDK or app at your local server with a one-line change
  • Caddy reverse proxy enforces Bearer token auth on all API requests (except / health check)
  • NVIDIA GPU (CUDA) acceleration for faster inference (:cuda image tag)
  • Automatically built and published via GitHub Actions
  • Persistent model storage via a Docker volume
  • Lightweight image (~70MB); multi-arch: linux/amd64, linux/arm64

Also available:

Tip: Ollama, LiteLLM, Whisper, Kokoro, and Embeddings can be used together to build a complete, private AI stack on your own server.

Security note

~175,000 Ollama servers were found publicly exposed without authentication (source). A bare Ollama install binds to all interfaces with no auth by default. This image enforces Bearer token authentication on all API requests via a built-in auth proxy, so unauthorized access is blocked even if the port is accidentally exposed.

Quick start

Step 1. Start the Ollama server:

docker run \
    --name ollama \
    --restart=always \
    -v ollama-data:/var/lib/ollama \
    -p 11434:11434/tcp \
    -d hwdsl2/ollama-server

On first start, an API key is auto-generated and displayed in the container logs. All API requests require this key.

Note: For internet-facing deployments with HTTPS, see Using a reverse proxy.

Step 2. Get the API key:

# View the key in the container logs
docker logs ollama

# Or retrieve it for use in scripts
API_KEY=$(docker exec ollama ollama_manage --getkey)

The API key is displayed in a box labeled Ollama API key. To display it again at any time:

docker exec ollama ollama_manage --showkey

Step 3. Pull a model:

docker exec ollama ollama_manage --pull llama3.2:3b

Tip: To pull one or more models automatically on first start, set OLLAMA_MODELS before running the container:

docker run \
    --name ollama \
    --restart=always \
    -v ollama-data:/var/lib/ollama \
    -p 11434:11434/tcp \
    -e OLLAMA_MODELS=llama3.2:3b \
    -d hwdsl2/ollama-server

Or add OLLAMA_MODELS=llama3.2:3b to your ollama.env file (see Environment variables).

Step 4. Test with the API:

API_KEY=$(docker exec ollama ollama_manage --getkey)

# List models
curl http://localhost:11434/api/tags \
  -H "Authorization: Bearer $API_KEY"

# Chat completion (streaming)
curl http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"model": "llama3.2:3b", "messages": [{"role": "user", "content": "Hello!"}]}'

Note: The docker exec management commands (ollama_manage) do not require the API key.

To learn more about how to use this image, read the sections below.

Requirements

  • A Linux server (local or cloud) with Docker installed
  • Sufficient disk space for models (3B models ≈ 2GB, 7B models ≈ 4–5GB, 14B+ models ≈ 8–10GB+)
  • Sufficient RAM to run models (3B models ≈ 2–4GB, 7B models ≈ 6–8GB, 14B+ models ≈ 12–16GB+)
  • TCP port 11434 (or your configured port) accessible

For GPU acceleration (:cuda image):

Download

Get the trusted build from the Docker Hub registry:

docker pull hwdsl2/ollama-server

For GPU support:

docker pull hwdsl2/ollama-server:cuda

Alternatively, you may download from Quay.io:

docker pull quay.io/hwdsl2/ollama-server
docker image tag quay.io/hwdsl2/ollama-server hwdsl2/ollama-server

Supported platforms: linux/amd64 and linux/arm64. The :cuda tag supports linux/amd64 only.

Environment variables

All variables are optional. If not set, secure defaults are used automatically.

This Docker image uses the following variables, that can be declared in an env file (see example):

Variable Description Default
OLLAMA_API_KEY API key for authenticating requests (auto-generated if not set) Auto-generated
OLLAMA_PORT TCP port for the API (1–65535) 11434
OLLAMA_HOST Hostname or IP shown in startup info and --showkey output Auto-detected
OLLAMA_DEBUG Set to 1 to enable verbose debug logging (not set)
OLLAMA_MODELS Comma-separated models to pull on first start, e.g. llama3.2:3b,qwen2.5:7b (not set)
OLLAMA_MAX_LOADED_MODELS Max models kept loaded in memory simultaneously (Ollama default)
OLLAMA_NUM_PARALLEL Number of parallel request slots per model (Ollama default)
OLLAMA_CONTEXT_LENGTH Default context window size in tokens (Ollama default)

Note: In your env file, you may enclose values in single quotes, e.g. VAR='value'. Do not add spaces around =. If you change OLLAMA_PORT, update the -p flag in the docker run command accordingly.

Example using an env file:

cp ollama.env.example ollama.env
# Edit ollama.env and set your values, then:
docker run \
    --name ollama \
    --restart=always \
    -v ollama-data:/var/lib/ollama \
    -v ./ollama.env:/ollama.env:ro \
    -p 11434:11434/tcp \
    -d hwdsl2/ollama-server

Model management

Use docker exec to manage models with the ollama_manage helper script. Models are stored in the Docker volume and persist across container restarts.

List downloaded models:

docker exec ollama ollama_manage --listmodels

Pull a model:

# Small, fast models (recommended for getting started)
docker exec ollama ollama_manage --pull llama3.2:3b
docker exec ollama ollama_manage --pull qwen2.5:7b

# Larger models (require more RAM/VRAM)
docker exec ollama ollama_manage --pull mistral:7b
docker exec ollama ollama_manage --pull phi4:14b
docker exec ollama ollama_manage --pull gemma3:12b

Remove a model:

docker exec ollama ollama_manage --remove llama3.2:3b

Show running models and memory usage:

docker exec ollama ollama_manage --status

Update all models (re-pulls latest versions):

docker exec ollama ollama_manage --update

Show the API key:

docker exec ollama ollama_manage --showkey

Get the API key (machine-readable, for use in scripts):

API_KEY=$(docker exec ollama ollama_manage --getkey)

Pull models on first start using the OLLAMA_MODELS variable in your env file:

OLLAMA_MODELS=llama3.2:3b,qwen2.5:7b

Using the API

All API requests require a Bearer token. Retrieve the API key first:

API_KEY=$(docker exec ollama ollama_manage --getkey)

Ollama API:

# List models
curl http://localhost:11434/api/tags \
  -H "Authorization: Bearer $API_KEY"

# Generate (streaming)
curl http://localhost:11434/api/generate \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"model": "llama3.2:3b", "prompt": "Why is the sky blue?"}'

# Chat completion (streaming)
curl http://localhost:11434/api/chat \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"model": "llama3.2:3b", "messages": [{"role": "user", "content": "Hello!"}]}'

OpenAI-compatible API (works with any OpenAI SDK or app):

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"model": "llama3.2:3b", "messages": [{"role": "user", "content": "Hello!"}]}'

Python (OpenAI SDK):

from openai import OpenAI

client = OpenAI(
    api_key="<your-api-key>",
    base_url="http://localhost:11434/v1",
)

response = client.chat.completions.create(
    model="llama3.2:3b",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Persistent data

All server data is stored in the Docker volume (/var/lib/ollama inside the container):

/var/lib/ollama/
├── models/           # Downloaded model files
├── .api_key          # API key (auto-generated, or synced from OLLAMA_API_KEY)
├── .initialized      # First-run marker
├── .port             # Saved port (used by ollama_manage)
├── .server_addr      # Cached server address (used by ollama_manage --showkey)
└── .Caddyfile        # Generated Caddy config (auth proxy)

Back up the Docker volume to preserve your models and API key.

Using docker-compose

cp ollama.env.example ollama.env
# Edit ollama.env and set your values, then:
docker compose up -d
docker logs ollama

Example docker-compose.yml (already included):

services:
  ollama:
    image: hwdsl2/ollama-server
    container_name: ollama
    restart: always
    ports:
      - "11434:11434/tcp"
    volumes:
      - ollama-data:/var/lib/ollama
      - ./ollama.env:/ollama.env:ro

volumes:
  ollama-data:

GPU acceleration (CUDA)

Use docker-compose.cuda.yml to run with NVIDIA GPU support:

docker compose -f docker-compose.cuda.yml up -d

Requirements: NVIDIA GPU and the NVIDIA Container Toolkit installed on the host. The :cuda image is linux/amd64 only.

Using a reverse proxy

For internet-facing deployments, put a reverse proxy in front to handle HTTPS. The built-in Caddy auth proxy handles authentication; the external reverse proxy adds TLS. Use one of the following addresses to reach the Ollama container:

  • ollama:11434 — if your reverse proxy runs as a container in the same Docker network.
  • 127.0.0.1:11434 — if your reverse proxy runs on the host and the port is published.

Note: The Authorization: Bearer header passes through reverse proxies automatically — no special configuration needed.

Example with Caddy (automatic TLS via Let's Encrypt):

Caddyfile:

ollama.example.com {
  reverse_proxy ollama:11434
}

Example with nginx (reverse proxy on the host):

server {
  listen 443 ssl;
  server_name ollama.example.com;

  ssl_certificate     /path/to/cert.pem;
  ssl_certificate_key /path/to/key.pem;

  location / {
    proxy_pass http://127.0.0.1:11434;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_read_timeout 300s;
    proxy_buffering off;
  }
}

After setting up a reverse proxy, set OLLAMA_HOST=ollama.example.com in your env file so that the correct endpoint URL is shown in the startup logs and ollama_manage --showkey output.

Update Docker image

To update the Docker image and container:

docker pull hwdsl2/ollama-server
docker rm -f ollama
# Then re-run the docker run command from Quick start with the same volume.

Your downloaded models are preserved in the ollama-data volume.

Using with other AI services

The Ollama, LiteLLM, Whisper (STT), Kokoro (TTS), and Embeddings images can be combined to build a complete, private AI stack on your own server — from voice I/O to RAG-powered question answering. Whisper, Kokoro, and Embeddings run fully locally. Ollama runs all LLM inference locally, so no data is sent to third parties. When using LiteLLM with external providers (e.g., OpenAI, Anthropic), your data will be sent to those providers.

graph LR
    D["📄 Documents"] -->|embed| E["Embeddings<br/>(text → vectors)"]
    E -->|store| VDB["Vector DB<br/>(Qdrant, Chroma)"]
    A["🎤 Audio input"] -->|transcribe| W["Whisper<br/>(speech-to-text)"]
    W -->|query| E
    VDB -->|context| L["LiteLLM<br/>(AI gateway)"]
    W -->|text| L
    L -->|routes to| O["Ollama<br/>(local LLM)"]
    L -->|response| T["Kokoro TTS<br/>(text-to-speech)"]
    T --> B["🔊 Audio output"]
Loading
Service Role Default port
Ollama Runs local LLM models (llama3, qwen, mistral, etc.) 11434
LiteLLM AI gateway — routes requests to Ollama, OpenAI, Anthropic, and 100+ providers 4000
Embeddings Converts text to vectors for semantic search and RAG 8000
Whisper (STT) Transcribes spoken audio to text 9000
Kokoro (TTS) Converts text to natural-sounding speech 8880

Connect Ollama to LiteLLM:

# In docker-litellm, add Ollama as a model provider:
docker exec litellm litellm_manage \
  --addmodel ollama/llama3.2:3b \
  --base-url http://ollama:11434
Voice pipeline example

Transcribe a spoken question, get a local LLM response via Ollama, and convert it to speech:

OLLAMA_KEY=$(docker exec ollama ollama_manage --getkey)
LITELLM_KEY=$(docker exec litellm litellm_manage --getkey)

# Step 1: Transcribe audio to text (Whisper)
TEXT=$(curl -s http://localhost:9000/v1/audio/transcriptions \
    -F file=@question.mp3 -F model=whisper-1 | jq -r .text)

# Step 2: Send text to Ollama via LiteLLM and get a response
RESPONSE=$(curl -s http://localhost:4000/v1/chat/completions \
    -H "Authorization: Bearer $LITELLM_KEY" \
    -H "Content-Type: application/json" \
    -d "{\"model\":\"ollama/llama3.2:3b\",\"messages\":[{\"role\":\"user\",\"content\":\"$TEXT\"}]}" \
    | jq -r '.choices[0].message.content')

# Step 3: Convert the response to speech (Kokoro TTS)
curl -s http://localhost:8880/v1/audio/speech \
    -H "Content-Type: application/json" \
    -d "{\"model\":\"tts-1\",\"input\":\"$RESPONSE\",\"voice\":\"af_heart\"}" \
    --output response.mp3
RAG pipeline example

Embed documents for semantic search, retrieve context, then answer questions with a local Ollama model:

OLLAMA_KEY=$(docker exec ollama ollama_manage --getkey)
LITELLM_KEY=$(docker exec litellm litellm_manage --getkey)

# Step 1: Embed a document chunk and store the vector in your vector DB
curl -s http://localhost:8000/v1/embeddings \
    -H "Content-Type: application/json" \
    -d '{"input": "Docker simplifies deployment by packaging apps in containers.", "model": "text-embedding-ada-002"}' \
    | jq '.data[0].embedding'
# → Store the returned vector alongside the source text in Qdrant, Chroma, pgvector, etc.

# Step 2: At query time, embed the question, retrieve the top matching chunks from
#          the vector DB, then send the question and retrieved context to Ollama via LiteLLM.
curl -s http://localhost:4000/v1/chat/completions \
    -H "Authorization: Bearer $LITELLM_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "model": "ollama/llama3.2:3b",
      "messages": [
        {"role": "system", "content": "Answer using only the provided context."},
        {"role": "user", "content": "What does Docker do?\n\nContext: Docker simplifies deployment by packaging apps in containers."}
      ]
    }' \
    | jq -r '.choices[0].message.content'
Full stack docker-compose example
services:
  ollama:
    image: hwdsl2/ollama-server
    container_name: ollama
    restart: always
    volumes:
      - ollama-data:/var/lib/ollama
      - ./ollama.env:/ollama.env:ro

  litellm:
    image: hwdsl2/litellm-server
    container_name: litellm
    restart: always
    ports:
      - "127.0.0.1:4000:4000/tcp"
    volumes:
      - litellm-data:/etc/litellm
      - ./litellm.env:/litellm.env:ro

volumes:
  ollama-data:
  litellm-data:

Technical details

  • Base image: debian:trixie-slim (CPU) / nvidia/cuda:12.9.1-base-ubuntu24.04 (CUDA)
  • Image size: ~70MB (CPU) / ~3.2GB (CUDA)
  • Ollama: latest release, installed as a static binary
  • Auth proxy: Caddy (always active, enforces Bearer token auth)
  • Data directory: /var/lib/ollama (Docker volume)
  • Model storage: /var/lib/ollama/models inside the volume
  • Ollama API: http://localhost:11434 (or your configured port)
  • OpenAI-compatible API: http://localhost:11434/v1

License

Note: The software components inside the pre-built image (such as Ollama, Caddy, and their dependencies) are under the respective licenses chosen by their respective copyright holders. As for any pre-built image usage, it is the image user's responsibility to ensure that any use of this image complies with any relevant licenses for all software contained within.

Copyright (C) 2026 Lin Song
This work is licensed under the MIT License.

Ollama is Copyright (C) 2023 Ollama, and is distributed under the MIT License.

Caddy is Copyright (C) 2015 Matthew Holt and The Caddy Authors, and is distributed under the Apache License 2.0.

This project is an independent Docker setup for Ollama and is not affiliated with, endorsed by, or sponsored by Ollama.